Abstract
Industry is building larger, more complex, manycore processors on the back of strong institutional knowledge, but academic projects face difficulties in replicating that scale. To alleviate these difficulties and to develop and share knowledge, the community needs open architecture frameworks for simulation, synthesis, and software exploration which support extensibility, scalability, and configurability, alongside an established base of verification tools and supported software. In this paper we present OpenPiton, an open source framework for building scalable architecture research prototypes from 1 core to 500 million cores. OpenPiton is the world's first open source, general-purpose, multithreaded manycore processor and framework. OpenPiton leverages the industry hardened OpenSPARC T1 core with modifications and builds upon it with a scratch-built, scalable uncore creating a flexible, modern manycore design. In addition, OpenPiton provides synthesis and backend scripts for ASIC and FPGA to enable other researchers to bring their designs to implementation. OpenPiton provides a complete verification infrastructure of over 8000 tests, is supported by mature software tools, runs full-stack multiuser Debian Linux, and is written in industry standard Verilog. Multiple implementations of OpenPiton have been created including a taped-out 25-core implementation in IBM's 32nm process and multiple Xilinx FPGA prototypes.
- Beri processor 'arcina' release 1. https://github.com/CTSRD-CHERI/beri. Accessed Jan. 2016.Google Scholar
- eXtensible Utah Multicore (xum). https://github.com/grantae/mips32r1_xum. Accessed Jan. 2016.Google Scholar
- Mips32 release 1. https://github.com/grantae/mips32r1_core. Accessed Jan. 2016.Google Scholar
- Zet processor. http://zet.aluzina.org/index.php/Zet_processor. Accessed Jan. 2016.Google Scholar
- Zylin cpu. https://github.com/zylin/zpu. Accessed Jan. 2016.Google Scholar
- OpenSPARC T1 Microarchitecture Specification. Santa Clara, CA, 2006.Google Scholar
- OpenSPARC T2 Core Microarchitecture Specification. Santa Clara, CA, 2007.Google Scholar
- \relax Aeste Works. Aemb multi-threaded 32-bit embedded core family. https://github.com/aeste/aemb. Accessed Jan. 2016.Google Scholar
- R. Balasubramanian, V. Gangadhar, Z. Guo, C.-H. Ho, C. Joseph, J. Menon, M. P. Drumond, R. Paul, S. Prasad, P. Valathol, and K. Sankaralingam. Enabling gpgpu low-level hardware explorations with miaow: An open-source rtl implementation of a gpgpu. ACM Trans. Archit. Code Optim., 12(2), June 2015.Google ScholarDigital Library
- R. R. Balwaik, S. R. Nayak, and A. Jeyakumar. Open-source 32-bit risc soft-core processors. IOSR Journal od VLSI and Signal Processing, 2(4):43--46, 2013.Google Scholar
- L. Barthe, L. Cargnini, P. Benoit, and L. Torres. The secretblaze: A configurable and cost-effective open-source soft-core processor. In Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on, pages 310--313, May 2011.Google ScholarDigital Library
- A. Baumann, P. Barham, P.-E. Dagand, T. Harris, R. Isaacs, S. Peter, T. Roscoe, A. Schüpbach, and A. Singhania. The multikernel: A new os architecture for scalable multicore systems. In Proceedings of the ACM SIGOPS 22Nd Symposium on Operating Systems Principles, SOSP '09, pages 29--44, New York, NY, USA, 2009. ACM.Google ScholarDigital Library
- M. Bilzor, T. Huffmire, C. Irvine, and T. Levin. Evaluating security requirements in a general-purpose processor by combining assertion checkers with code coverage. In Hardware-Oriented Security and Trust (HOST), 2012 IEEE International Symposium on, pages 49--54, June 2012.Google ScholarCross Ref
- N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood. The gem5 simulator. SIGARCH Computer Architecture News, 39(2):1--7, Aug. 2011.Google ScholarDigital Library
- D. Bittman, D. Capelis, and D. Long. Introducing seaos. In Information Science and Applications (ICISA), 2014 International Conference on, pages 1--3, May 2014.Google ScholarCross Ref
- S. Boyd-Wickizer, H. Chen, R. Chen, Y. Mao, F. Kaashoek, R. Morris, A. Pesterev, L. Stein, M. Wu, Y. Dai, Y. Zhang, and Z. Zhang. Corey: An operating system for many cores. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI'08, pages 43--57, Berkeley, CA, USA, 2008. USENIX Association.Google ScholarDigital Library
- S. Boyd-Wickizer, A. T. Clements, Y. Mao, A. Pesterev, M. F. Kaashoek, R. Morris, and N. Zeldovich. An analysis of linux scalability to many cores. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, OSDI'10, pages 1--8, Berkeley, CA, USA, 2010. USENIX Association.Google ScholarDigital Library
- R. Busseuil, L. Barthe, G. Almeida, L. Ost, F. Bruguier, G. Sassatelli, P. Benoit, M. Robert, and L. Torres. Open-scale: A scalable, open-source noc-based mpsoc for design space exploration. In Reconfigurable Computing and FPGAs (ReConFig), 2011 Int. Conference on, pages 357--362, Nov 2011.Google ScholarDigital Library
- S. Campanoni, T. Jones, G. Holloway, V. J. Reddi, G.-Y. Wei, and D. Brooks. Helix: Automatic parallelization of irregular programs for chip multiprocessing. In Proceedings of the Tenth International Symposium on Code Generation and Optimization, CGO '12, pages 84--93, New York, NY, USA, 2012. ACM.Google ScholarDigital Library
- D. J. Capelis. Lockbox: Helping computers keep your secrets. Technical Report UCSC-WASP-15-02, University of California, Santa Cruz, Nov. 2015.Google Scholar
- T. E. Carlson, W. Heirman, and L. Eeckhout. Sniper: exploring the level of abstraction for scalable and accurate parallel multi-core simulation. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11, pages 52:1--52:12, New York, NY, USA, 2011. ACM.Google ScholarDigital Library
- C. Celio, D. A. Patterson, and K. Asanović. The berkeley out-of-order machine (boom): An industry-competitive, synthesizable, parameterized risc-v processor. Technical Report UCB/EECS-2015--167, EECS Department, University of California, Berkeley, Jun 2015.Google Scholar
- D. Champagne and R. Lee. Scalable architectural support for trusted software. In High Performance Computer Architecture (HPCA), IEEE 16th Int. Symposium on, pages 1--12, Jan 2010.Google Scholar
- \relax Cobham Gaisler AB. Grlib ip core user's manual. May 2015.Google Scholar
- A. da Silva and S. Sanchez. Leon3 vip: A virtual platform with fault injection capabilities. In Digital System Design: Architectures, Methods and Tools (DSD), 2010 13th Euromicro Conference on, pages 813--816, Sept 2010.Google Scholar
- M. Ebrahimi, L. Chen, H. Asadi, and M. Tahoori. Class: Combined logic and architectural soft error sensitivity analysis. In Design Automation Conference (ASP-DAC), 2013 18th Asia and South Pacific, pages 601--607, Jan 2013.Google ScholarCross Ref
- M. Ebrahimi, M. Daneshtalab, and J. Plosila. High performance fault-tolerant routing algorithm for noc-based many-core systems. In Parallel, Distributed and Network-Based Processing (PDP), 2013 21st Euromicro International Conference on, pages 462--469, Feb 2013.Google ScholarDigital Library
- H. Esmaeilzadeh, E. Blem, R. St. Amant, K. Sankaralingam, and D. Burger. Dark silicon and the end of multicore scaling. In Proceedings of the 38th Annual International Symposium on Computer Architecture, ISCA '11, pages 365--376, New York, NY, USA, 2011. ACM.Google ScholarDigital Library
- C. W. Fletcher, L. Ren, A. Kwon, M. van Dijk, and S. Devadas. Freecursive oram: [nearly] free recursion and integrity verification for position-based oblivious ram. In Proceedings of the Twentieth Int. Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '15, pages 103--116, New York, NY, USA, 2015. ACM.Google ScholarDigital Library
- Y. Fu, T. M. Nguyen, and D. Wentzlaff. Coherence domain restriction on large scale systems. In Proceedings of the 48th International Symposium on Microarchitecture, MICRO-48, pages 686--698, New York, NY, USA, 2015. ACM.Google ScholarDigital Library
- Y. Fu and D. Wentzlaff. Prime: A parallel and distributed simulator for thousand-core chips. In Performance Analysis of Systems and Software (ISPASS), 2014 IEEE International Symposium on, pages 116--125, March 2014.Google ScholarCross Ref
- O. Goldreich and R. Ostrovsky. Software protection and simulation on oblivious rams. J. ACM, 43(3):431--473, May 1996.Google ScholarDigital Library
- M.-Y. Hsieh. A scalable simulation framework for evaluating thermal management techniques and the lifetime reliability of multithreaded multicore systems. In Int. Green Computing Conference and Workshops, pages 1--6, July 2011.Google ScholarDigital Library
- HT-Lab. Cpu86: 8088 fpga ip core. http://ht-lab.com/freecores/cpu8086/cpu86.html. Accessed Jan. 2016.Google Scholar
- H. Hua, C. Mineo, K. Schoenfliess, A. Sule, S. Melamed, R. Jenkal, and W. Davis. Exploring compromises among timing, power and temperature in three-dimensional integrated circuits. In Design Automation Conference, 2006 43rd ACM/IEEE, pages 997--1002, 2006.Google ScholarDigital Library
- T. Instruments. Msp430x1xx family user's guide, 2006.Google Scholar
- R. Jia, C. Lin, Z. Guo, R. Chen, F. Wang, T. Gao, and H. Yang. A survey of open source processors for fpgas. In Field Programmable Logic and Applications (FPL), 2014 24th International Conference on, pages 1--6, Sept 2014.Google Scholar
- O. Khalid, C. Rolfes, and A. Ibing. On implementing trusted boot for embedded systems. In Hardware-Oriented Security and Trust, IEEE Int. Symposium on, pages 75--80, June 2013.Google Scholar
- S. T. King, J. Tucek, A. Cozzie, C. Grier, W. Jiang, and Y. Zhou. Designing and implementing malicious hardware. LEET, 8:1--8, 2008.Google ScholarDigital Library
- M. Kochte, M. Schaal, H. Wunderlich, and C. Zoellin. Efficient fault simulation on many-core processors. In ACM/IEEE Design Automation Conference, pages 380--385, June 2010.Google ScholarDigital Library
- R. Kumar, K. Farkas, N. P. Jouppi, P. Ranganathan, and D. M. Tullsen. Single-isa heterogeneous multi-core architectures: The potential for processor power reduction. In Microarchitecture, 2003. MICRO-36. Proceedings. 36th Annual IEEE/ACM Int. Symposium on, pages 81--92. IEEE, 2003.Google ScholarCross Ref
- Y. Lee, A. Waterman, R. Avizienis, H. Cook, C. Sun, V. Stojanovic, and K. Asanović. A 45nm 1.3ghz 16.7 double-precision gflops/w risc-v processor with vector accelerators. In European Solid State Circuits Conference (ESSCIRC), ESSCIRC 2014 - 40th, pages 199--202, Sept 2014.Google ScholarCross Ref
- S. Li, J. H. Ahn, R. Strong, J. Brockman, D. Tullsen, and N. Jouppi. Mcpat: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In Microarchitecture, 2009. MICRO-42. 42nd Annual IEEE/ACM Int. Symposium on, pages 469--480, Dec 2009.Google ScholarDigital Library
- J. Lu and B. Taskin. From rtl to gdsii: An asic design course development using synopsys® university program. In Microelectronic Systems Education (MSE), 2011 IEEE International Conference on, pages 72--75, June 2011.Google ScholarDigital Library
- A. J. Massa. Embedded software development with eCos. Prentice Hall Professional, 2003.Google Scholar
- M. McKeown, J. Balkind, and D. Wentzlaff. Execution drafting: Energy efficiency through computation deduplication. In Microarchitecture (MICRO), 2014 47th Annual IEEE/ACM International Symposium on, pages 432--444, Dec 2014.Google ScholarDigital Library
- B. L. Meakin. Multicore system design with xum: The extensible utah multicore project. Master's thesis, The University of Utah, 2010.Google Scholar
- N. Mehdizadeh, M. Shokrolah-Shirazi, and S. Miremadi. Analyzing fault effects in the 32-bit openrisc 1200 microprocessor. In Availability, Reliability and Security. ARES 08. Third International Conference on, pages 648--652, March 2008.Google Scholar
- B. Miller, D. Brasili, T. Kiszely, R. Kuhn, R. Mehrotra, M. Salvi, M. Kulkarni, A. Varadharajan, S.-H. Yin, W. Lin, A. Hughes, B. Stysiack, V. Kandadi, I. Pragaspathi, D. Hartman, D. Carlson, V. Yalala, T. Xanthopoulos, S. Meninger, E. Crain, M. Spaeth, A. Aina, S. Balasubramanian, J. Vulih, P. Tiwary, D. Lin, R. Kessler, B. Fishbein, and A. Jain. A 32-core risc microprocessor with network accelerators, power management and testability features. In IEEE Int. Solid-State Circuits Conf. Digest of Tech. Papers, pages 58--60, Feb 2012.Google ScholarCross Ref
- J. Miller, H. Kasture, G. Kurian, C. Gruenwald, N. Beckmann, C. Celio, J. Eastep, and A. Agarwal. Graphite: a distributed parallel simulator for multicores. In IEEE 16th Int. Symp. on High Performance Computer Architecture, pages 1--12, 2010.Google ScholarCross Ref
- S. T. S. Ngiap. Aemb 32-bit microprocessor core datasheet, November 2007.Google Scholar
- J. Olivares, J. Palomares, J. Soto, and J. Gámez. Teaching microprocessors design using fpgas. In Education Engineering (EDUCON), 2010 IEEE, pages 1189--1193, April 2010.Google ScholarCross Ref
- OpenCores. Altor32 - alternative lightweight openrisc cpu. http://opencores.org/project,altor32. Accessed Jan. 2016.Google Scholar
- OpenCores. Amber arm-compatible core. http://opencores.org/project,amber. Accessed Jan. 2016.Google Scholar
- OpenCores. Openmsp430. http://opencores.org/project,openmsp430. Accessed Jan. 2016.Google Scholar
- OpenCores. Or1200 openrisc processor. http://opencores.org/or1k/OR1200_OpenRISC_Processor. Accessed Jan. 2016.Google Scholar
- OpenCores. pAVR. http://opencores.org/project,pavr. Accessed Jan. 2016.Google Scholar
- Oracle. OpenSPARC T1. http://www.oracle.com/technetwork/systems/opensparc/opensparc-t1-page-1%444609.html.Google Scholar
- P. M. Ortego and P. Sack. Sesc: Superescalar simulator. In 17th Euro micro conf. on real time systems, pages 1--4, 2004.Google Scholar
- I. Parulkar, A. Wood, J. C. Hoe, B. Falsafi, S. V. Adve, J. Torrellas, and S. Mitra. Opensparc: An open platform for hardware reliability experimentation. In Fourth Workshop on Silicon Errors in Logic-System Effects (SELSE). Citeseer, 2008.Google Scholar
- A. Pellegrini, R. Smolinski, L. Chen, X. Fu, S. Hari, J. Jiang, S. Adve, T. Austin, and V. Bertacco. Crashtest'ing swat: Accurate, gate-level evaluation of symptom-based resiliency solutions. In Design, Automation Test in Europe Conference Exhibition (DATE), 2012, pages 1106--1109, March 2012.Google ScholarCross Ref
- C. D. Polychronopoulos. Parallel programming and compilers, volume 59. Springer Science & Business Media, 2012.Google Scholar
- PyHP. PyHP Official Home Page. http://pyhp.sourceforge.net.Google Scholar
- A. Raman, A. Zaks, J. W. Lee, and D. I. August. Parcae: A system for flexible parallel execution. In Proc. of the ACM SIGPLAN Conf. on Programming Language Design and Implementation, pages 133--144, New York, NY, USA, 2012.Google ScholarDigital Library
- \relax Aeroflex Gaisler AB. Sparc v8 32-bit processor leon3/leon3-ft companioncore data sheet, March 2010.Google Scholar
- \relax UC Berkeley Architecture Research. The berkeley out-of-order risc-v processor. https://github.com/ucb-bar/riscv-boom. Accessed Jan. 2016.Google Scholar
- \relax UC Berkeley Architecture Research. Rocket core. https://github.com/ucb-bar/rocket. Accessed Jan. 2016.Google Scholar
- S. RISC. Simply risc s1 core. http://www.srisc.com/?s1. Accessed Jan. 2016.Google Scholar
- P. Schaumont and I. Verbauwhede. Thumbpod puts security under your thumb. Xilinx® Xcell J, 2003.Google Scholar
- L. Seiler, D. Carmean, E. Sprangle, T. Forsyth, M. Abrash, P. Dubey, S. Junkins, A. Lake, J. Sugerman, R. Cavin, R. Espasa, E. Grochowski, T. Juan, and P. Hanrahan. Larrabee: A many-core x86 architecture for visual computing. ACM Trans. Graph., 27(3):18:1--18:15, Aug. 2008.Google ScholarDigital Library
- L. Semiconductor. Latticemico32 open, free 32-bit soft processor. http://www.latticesemi.com/en/Products/DesignSoftwareAndIP/Intellectual%Property/IPCore/IPCores02/LatticeMico32.aspx. Accessed Jan. 2016.Google Scholar
- Y. S. Shao, B. Reagen, G.-Y. Wei, and D. Brooks. Aladdin: A pre-rtl, power-performance accelerator simulator enabling large design space exploration of customized architectures. In The 41st Annual Int. Symposium on Computer Architecture, pages 97--108, Piscataway, NJ, USA, 2014. IEEE Press.Google ScholarDigital Library
- S. Shengfeng, Z. Dexue, and Y. Guoping. Soc verification platform based on aemb softcore processor [j]. Microcontrollers & Embedded Systems, 4:016, 2010.Google Scholar
- J. C. Smolens, B. T. Gold, J. C. Hoe, B. Falsafi, and K. Mai. Detecting emerging wearout faults. In Proc. of Workshop on SELSE, 2007.Google Scholar
- E. Stefanov, M. van Dijk, E. Shi, C. Fletcher, L. Ren, X. Yu, and S. Devadas. Path oram: An extremely simple oblivious ram protocol. In Proceedings of the 2013 ACM SIGSAC Conference on Computer and Communications Security, CCS '13, pages 299--310, New York, NY, USA, 2013. ACM.Google ScholarDigital Library
- A. Strelzoff. Teaching computer architecture with fpga soft processors. In ASEE Southeast Section Conference, 2007.Google Scholar
- J. Szefer and R. Lee. Architectural support for hypervisor-secure virtualization. In Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVII, pages 437--450, New York, NY, USA, 2012. ACM.Google ScholarDigital Library
- J. Szefer, W. Zhang, Y.-Y. Chen, D. Champagne, K. Chan, W. Li, R. Cheung, and R. Lee. Rapid single-chip secure processor prototyping on the opensparc fpga platform. In Rapid System Prototyping (RSP), 2011 22nd IEEE International Symposium on, pages 38--44, May 2011.Google ScholarCross Ref
- J. Tandon. The openrisc processor: open hardware and linux. Linux Journal, 2011(212):6, 2011.Google ScholarDigital Library
- J. Tong, I. Anderson, and M. Khalid. Soft-core processors for embedded systems. In Microelectronics, 2006. ICM '06. International Conference on, pages 170--173, Dec 2006.Google ScholarCross Ref
- R. Ubal, B. Jang, P. Mistry, D. Schaa, and D. Kaeli. Multi2sim: A simulation framework for cpu-gpu computing. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques, PACT '12, pages 335--344, New York, NY, USA, 2012. ACM.Google ScholarDigital Library
- S. R. Vangal, J. Howard, G. Ruhl, S. Dighe, H. Wilson, J. Tschanz, D. Finan, A. Singh, T. Jacob, S. Jain, et al. An 80-tile sub-100-w teraflops processor in 65-nm cmos. Solid-State Circuits, IEEE Journal of, 43(1):29--41, 2008.Google Scholar
- R. N. M. Watson, J. Woodruff, D. Chisnall, B. Davis, W. Koszek, A. T. Markettos, S. W. Moore, S. J. Murdoch, P. G. Neumann, R. Norton, and M. Roe. Bluespec Extensible RISC Implementation: BERI Hardware reference. Technical Report UCAM-CL-TR-868, University of Cambridge, Computer Laboratory, Apr. 2015.Google Scholar
- D. Wentzlaff, P. Griffin, H. Hoffmann, L. Bao, B. Edwards, C. Ramey, M. Mattina, C.-C. Miao, J. F. Brown III, and A. Agarwal. On-chip interconnection architecture of the Tile Processor. IEEE Micro, 27(5):15--31, Sept. 2007.Google ScholarCross Ref
- D. Wentzlaff, C. Gruenwald, III, N. Beckmann, K. Modzelewski, A. Belay, L. Youseff, J. Miller, and A. Agarwal. An operating system for multicore and clouds: Mechanisms and implementation. In Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC '10, pages 3--14, New York, NY, USA, 2010. ACM.Google ScholarDigital Library
- D. Wentzlaff, C. J. Jackson, P. Griffin, and A. Agarwal. Configurable fine-grain protection for multicore processor virtualization. In Proceedings of the Annual Int. Symp. on Computer Architecture, pages 464--475, Washington, DC, USA, 2012.Google ScholarCross Ref
- D. H. Woo and H.-H. S. Lee. Extending amdahl's law for energy-efficient computing in the many-core era. Computer, (12):24--31, 2008.Google Scholar
- D. Yeh, L.-S. Peh, S. Borkar, J. Darringer, A. Agarwal, and W.-M. Hwu. Thousand-core chips [roundtable]. Design Test of Computers, IEEE, 25(3):272--278, May 2008.Google Scholar
- M. Zandrahimi, H. Zarandi, and A. Rohani. An analysis of fault effects and propagations in zpu: The world's smallest 32 bit cpu. In Quality Electronic Design (ASQED), 2010 2nd Asia Symp. on, pages 308--313, Aug 2010.Google ScholarCross Ref
Index Terms
- OpenPiton: An Open Source Manycore Research Framework
Recommendations
OpenPiton: An Open Source Manycore Research Framework
ASPLOS '16: Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating SystemsIndustry is building larger, more complex, manycore processors on the back of strong institutional knowledge, but academic projects face difficulties in replicating that scale. To alleviate these difficulties and to develop and share knowledge, the ...
OpenPiton: An Open Source Manycore Research Framework
ASPLOS'16Industry is building larger, more complex, manycore processors on the back of strong institutional knowledge, but academic projects face difficulties in replicating that scale. To alleviate these difficulties and to develop and share knowledge, the ...
A performance study of general-purpose applications on graphics processors using CUDA
Graphics processors (GPUs) provide a vast number of simple, data-parallel, deeply multithreaded cores and high memory bandwidths. GPU architectures are becoming increasingly programmable, offering the potential for dramatic speedups for a variety of ...
Comments