skip to main content
10.1145/339647.339696acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article
Free Access

Piranha: a scalable architecture based on single-chip multiprocessing

Authors Info & Claims
Published:01 May 2000Publication History

ABSTRACT

The microprocessor industry is currently struggling with higher development costs and longer design times that arise from exceedingly complex processors that are pushing the limits of instruction-level parallelism. Meanwhile, such designs are especially ill suited for important commercial applications, such as on-line transaction processing (OLTP), which suffer from large memory stall times and exhibit little instruction-level parallelism. Given that commercial applications constitute by far the most important market for high-performance servers, the above trends emphasize the need to consider alternative processor designs that specifically target such workloads. The abundance of explicit thread-level parallelism in commercial workloads, along with advances in semiconductor integration density, identify chip multiprocessing (CMP) as potentially the most promising approach for designing processors targeted at commercial servers.

This paper describes the Piranha system, a research prototype being developed at Compaq that aggressively exploits chip multi-processing by integrating eight simple Alpha processor cores along with a two-level cache hierarchy onto a single chip. Piranha also integrates further on-chip functionality to allow for scalable multiprocessor configurations to be built in a glueless and modular fashion. The use of simple processor cores combined with an industry-standard ASIC design methodology allow us to complete our prototype within a short time-frame, with a team size and investment that are an order of magnitude smaller than that of a commercial microprocessor. Our detailed simulation results show that while each Piranha processor core is substantially slower than an aggressive next-generation processor, the integration of eight cores onto a single chip allows Piranha to outperform next-generation processors by up to 2.9 times (on a per chip basis) on important workloads such as OLTP. This performance advantage can approach a factor of five by using full-custom instead of ASIC logic. In addition to exploiting chip multiprocessing, the Piranha prototype incorporates several other unique design choices including a shared second-level cache with no inclusion, a highly optimized cache coherence protocol, and a novel I/O architecture.

References

  1. 1.A. Agarwal, R. Simoni, J. Hennessy, and M. Horowitz. An Evaluation of Directory Schemes for Cache Coherence. In 15th Annual International Symposium on Computer Architecture, pages 280-289, May 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2.P. Bannon. Alpha 21364: A Scalable Single-chip SMP. Presented at the Microprocessor Forum '98 (http://www.digital.com/alphaoem/microprocessorforum.htm), October 1998.Google ScholarGoogle Scholar
  3. 3.L.A. Barroso, K. Gharachorloo, A. Nowatzyk, and B. Verghese. Impact of Chip-Level Integration on Performance of OLTP Workloads. In 6th International Symposium on High-Performance Computer Architecture, pages 3-14, January 2000.Google ScholarGoogle Scholar
  4. 4.L.A. Barroso, K. Gharachorloo, and E. Bugnion. Memory System Characterization of Commercial Workloads. In 25th Annual International Symposium on Computer Architecture, pages 3-14, June 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5.J. Borkenhagen and S. Storino. 5th Generation 64-bit PowerPC-Compatible Commercial Processor Design. http://www.rs6OOO.ibm.com /resource/technology/pulsar.pdf. September 1999.Google ScholarGoogle Scholar
  6. 6.S. Crowder et al. IEDM Technical Digest, page 1017, 1998.Google ScholarGoogle Scholar
  7. 7.Z. Cvetanovic and D. Bhandarkar. Characterization of Alpha AXP Performance using TP and SPEC Workloads. In 21st Annual International Symposium on Computer Architecture, pages 60-70, April 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8.Z. Cvetanovic and D. Donaldson. AlphaServer 4100 Performance Characterization. In Digital Technical Journal, 8(4), pages 3-20, 1996.Google ScholarGoogle Scholar
  9. 9.K. Diefendorff. Power4 Focuses on Memory Bandwidth: IBM Confronts IA-64, Says ISA Not Important. In Microprocessor Report, Vol. 13, No. 13, October 1999.Google ScholarGoogle Scholar
  10. 10.Digital Equipment Corporation. Digital Semiconductor 21164 Alpha Microprocessor Hardware Reference Manual. March 1996.Google ScholarGoogle Scholar
  11. 11.S.J. Eggers, J. S. Emer, H. M. Levy, J. L. Lo, R. L. Stature, and D. M. Tullsen. Simultaneous Multithreading: A Platform for Next-Generation Processors. In IEEE Micro, pages 12-19, October 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12.R.J. Eickemeyer, R. E. Johnson, S. R. Kunkel, M. S. Squillante, and S. Liu. Evaluation of Multithreaded Uniprocessors for Commercial Application Environments. In 23rd Annual International Symposium on Computer Architecture, pages 203-212, May 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13.J.S. Emer. Simultaneous Multithreading: Multiplying Alpha's Performance. Presentation at the Microprocessor Forum '99, October 1999.Google ScholarGoogle Scholar
  14. 14.A. Gupta, W.-D. Weber, and T. Mowry. Reducing Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes. In International Conference on Parallel Processing, July 1990.Google ScholarGoogle Scholar
  15. 15.L. Hammond, B. Nayfeh, and K. Olukotun. A Single-Chip Multiprocessor. In IEEE Computer 30(9), pages 79-85, September 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. 16.L. Hammond, M. Willey, and K. Olukotun. Data Speculation Support for a Chip Multiprocessor. In 8th ACM International Symposium on Architectural Support for Programming Languages and 0 peratin Systems, San Jose, California, October 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. 17.L. Hammond, B. Hubbert, M. Siu, M. Prabhu, M. Willey, M. Chen, M. Kozyrczak, and K. Olukotun. The Stanford Hydra CMP. Presented at Hot Chips 11, August 1999.Google ScholarGoogle Scholar
  18. 18.J. Hennessy. The Future of Systems Research. In IEEE Computer, Vol. 32, No. 8, pages 27-33, August 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. 19.IBM Microelectronics. ASIC SA27E Databook. International Business Machines, 1999.Google ScholarGoogle Scholar
  20. 20.N.P. Jouppi and S. Wilton. Tradeoffs in Two-Level On-Chip Caching. In 21st Annual International Symposium on Computer Architecture, pages 34-45, April 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. 21.K. Keeton, D. A. Patterson, Y. Q. He, R. C. Raphael, and W. E. Baker. Performance Characterization of the Quad Pentium Pro SMP Using OLTP Workloads. In 25th Annual International Symposium on ComputerArchitecture, pages 15-26, June 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. 22.V. Krishnan and J. Torrellas. Hardware and Software Support for Speculative Execution of Sequential Binaries on Chip-Multiprocessor. In ACM International Conference on Supercomputing (ICS'98), pages 85-92, June 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. 23.S. Kunkel, B. Armstrong, and P. Vitale. System Optimization for OLTP Workloads. IEEE Micro, Vol. 19, No. 3, May/June 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. 24.J. Kuskin et al. The Stanford FLASH Multiprocessor. In 21st Annual International Symposium on Computer Architecture, April 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. 25.J. Laudon and D. Lenoski. The SGI Origin: A ccNUMA Highly Scalable Server. In 24 th Annual International Symposium on Computer Architecture, pages 241-251, June 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. 26.D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. L. Hennessy. The Directory-Based Cache Coherence Protocol for the DASH Multiprocessor. In 17 th Annual International Symposium on Computer Architecture, pages 94-105, May 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. 27.J. Lo, L. A. Barroso, S. Eggers, K. Gharachorloo, H. Levy, and S. Parekh. An Analysis of Database Workload Performance on Simultaneous Multithreaded Processors. In 25th Annual International Symposium on Computer Architecture, June 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. 28.A.M.G. Maynard, C. M. Donnelly, and B. R. Olszewski. Contrasting Characteristics and Cache Performance of Technical and Multi-User Commercial Workloads. In 6th International Conference on Architectural Support for Programming L anguages and 0 perating Syste~ns pages 145-156, October 1994. Google ScholarGoogle Scholar
  29. 29.B. Nayfeh, L. Hammond, and K. Olukotun. Evaluation of Design Alternatives for a Multiprocessor Microprocessor. In 23rd Annual International Symposium on Computer Architecture, May 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. 30.A. Nowatzyk, G. Aybay, M. Browne, W. Radke, and S. Vishin. S- Connect: from Networks of Workstations to Supercomputing Performance. In 22nd Annual International Symposium on Computer Architecture, pages 71-82, May 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. 31.A. Nowatzyk, G. Aybay, M. Browne, E. Kelly, M. Parkin, W. Radke, and S. Vishin. The S3.mp Scalable Shared Memory Multiprocessor. In International Conference on Parallel Processing (ICPP' 95), pages 1.1 - 1.10, July 1995.Google ScholarGoogle Scholar
  32. 32.A. Nowatzyk, G. Aybay, M. Browne, E. Kelly, M. Parkin, W. Radke, and S. Vishin. Exploiting Parallelism in Cache Coherency Protocol Engines. In EuroPar'95 International Conference on Parallel Processing, August 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. 33.K. Olukotun, B. Nayfeh, L. Hammond, K. Wilson, and K.-Y. Chang. The Case for a Single-Chip Multiprocessor. In 7 th International Symposium on Architectural Support for Programming L anguages and 0 perating System~October 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. 34.S.E. Perl and R. L. Sites. Studies of Windows NT Performance Using Dynamic Execution Traces. In 2nd Symposium on 0 perating System Design and Implementation, pages 169-184, October 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. 35.P. Ranganathan, K. Gharachorloo, S. Adve, and L. A. Barroso. Performance of Database Workloads on Shared- Memory Systems with Outof-Order Processors. In 8th International Conference on Architectural Support for Programming L anguages and 0 perating Syste~yages 307-318, October 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. 36.M. Rosenblum, E. Bugnion, S. A. Herrod, E. Witchel, and A. Gupta. The Impact of Architectural Trends on Operating System Performance. In 15th Symposium on 0 perating System Principl~sDecember 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. 37.M. Rosenblum, E. Bugnion, S. Herrod, and S. Devine. Using the SimOS Machine Simulator to Study Complex Computer Systems. In ACM Transactions on Modeling and Computer Simulation, Vol. 7, No. 1, pages 78-103, January 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. 38.A. Saulsbury, F. Pong, and A. Nowatzyk. Missing the Memory Wall: The Case for Processor/Memory Integration. In 23rd Annual International Symposium on Computer Architecture. May 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. 39.R.L. Sites and R. T. Witek. Alpha AXP Architecture Reference Manual (second edition). Digital Press, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. 40.Standard Performance Council. The SPEC95 CPU Benchmark Suite. http ://www.specbench.org, 1995.Google ScholarGoogle Scholar
  41. 41.J. Steffan and T. Mowry. The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization. In 4 th International Symposium on High-Performance Computer Architecture, pages 2-13, February 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. 42.S.S. Thakkar and M. Sweiger. Performance of an OLTP Application on Symmetry Multiprocessor System. In 17 th Annual International Symposium on Computer Architecture, pages 228-238, May 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. 43.Transaction Processing Performance Council. TPC Benchmark B Standard Specification Revision 2.0. June 1994.Google ScholarGoogle Scholar
  44. 44.Transaction Processing Performance Council. TPC Benchmark D (Decision Support) Standard Specification Revision 1.2. November 1996.Google ScholarGoogle Scholar
  45. 45.Transaction Processing Performance Council. TPC Benchmark C, Standard Specification Revision 3.6, October 1999.Google ScholarGoogle Scholar
  46. 46.P. Trancoso, J.-L. Larriba-Pey, Z. Zhang, and J. Torrellas. The Memory Performance of DSS Commercial Workloads in Shared-Memory Multiprocessors. In 3rd Annual International Symposium on High- Performance Computer Architecture, pages 250-260, February 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. 47.M. Tremblay. MAJC-5200: A VLIW Convergent MPSOC. In Microprocessor Forum, October 1999.Google ScholarGoogle Scholar
  48. 48.E. Witchel and M. Rosenblum. Embra: Fast and Flexible Machine Simulation. In 1996 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, pages 68-79, May 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Piranha: a scalable architecture based on single-chip multiprocessing

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ISCA '00: Proceedings of the 27th annual international symposium on Computer architecture
          June 2000
          327 pages
          ISBN:1581132328
          DOI:10.1145/339647
          • cover image ACM SIGARCH Computer Architecture News
            ACM SIGARCH Computer Architecture News  Volume 28, Issue 2
            Special Issue: Proceedings of the 27th annual international symposium on Computer architecture (ISCA '00)
            May 2000
            325 pages
            ISSN:0163-5964
            DOI:10.1145/342001
            Issue’s Table of Contents

          Copyright © 2000 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 May 2000

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate543of3,203submissions,17%

          Upcoming Conference

          ISCA '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader