skip to main content
article

EXOCHI: architecture and programming environment for a heterogeneous multi-core multithreaded system

Published:10 June 2007Publication History
Skip Abstract Section

Abstract

Future mainstream microprocessors will likely integrate specialized accelerators, such as GPUs, onto a single die to achieve better performance and power efficiency. However, it remains a keen challenge to program such a heterogeneous multicore platform, since these specialized accelerators feature ISAs and functionality that are significantly different from the general purpose CPU cores. In this paper, we present EXOCHI: (1) Exoskeleton Sequencer(EXO), an architecture to represent heterogeneous acceleratorsas ISA-based MIMD architecture resources, and a shared virtual memory heterogeneous multithreaded program execution model that tightly couples specialized accelerator cores with generalpurpose CPU cores, and (2) C for Heterogeneous Integration(CHI), an integrated C/C++ programming environment that supports accelerator-specific inline assembly and domain-specific languages. The CHI compiler extends the OpenMP pragma for heterogeneous multithreading programming, and produces a single fat binary with code sections corresponding to different instruction sets. The runtime can judiciously spread parallel computation across the heterogeneous cores to optimize performance and power.

We have prototyped the EXO architecture on a physical heterogeneous platform consisting of an Intel® Core™ 2 Duo processor and an 8-core 32-thread Intel® Graphics Media Accelerator X3000. In addition, we have implemented the CHI integrated programming environment with the Intel® C++ Compiler, runtime toolset, and debugger. On the EXO prototype system, we have enhanced a suite of production-quality media kernels for video and image processing to utilize the accelerator through the CHI programming interface, achieving significant speedup (1.41X to10.97X) over execution on the IA32 CPU alone.

References

  1. I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan. Brook for GPUs: Stream Computing on Graphics Hardware. ACM Transactions on Graphics, 23(3):777--786, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. CPU+GPU integration. http://www.google.com/search?hl=en&lr=&rls=GGLG%2CGGLG%2005--47%2CGGLG3Aen&q=intel+amd+nvidia+ati+cpu+gpu+integrated+&btnG=Search.Google ScholarGoogle Scholar
  3. CUDA. http://developer.nvidia.com/object/cuda.html.Google ScholarGoogle Scholar
  4. P. Dubey. Recognition, Mining and Synthesis Moves Computers to the Era of Tera. Technology@Intel Magazine, February 2005.Google ScholarGoogle Scholar
  5. A. Eichenberger, K. O'Brien, K. O'Brien, P. Wu, T. Chen, P. Oden, D. Prener, J. Shepherd, B. So, Z. Sura, A. Wang, T. Zhang, P. Zhao, and M. Gschwind. Optimizing Compiler for the CELL Processor. In Proceedings of the 14th international Conference on Parallel Architectures and Compilation Techniques, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. GLSL OpenGL Shading Language. www.wikipedia.org/wiki/GLSL.Google ScholarGoogle Scholar
  7. R. Gonzalez. A Software-configurable Processor Architecture. IEEE Micro, pages 42--51, Sept-Oct 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. N. Govindaraju, S. Larsen, J. Gray, and D.Manocha. AMemory Model for Scientific Algorithms on Graphics Processor. In IEEE Supercomputing, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. GPGPU: General Purpose Computation using Graphics Hardware. www.gpgpu.org.Google ScholarGoogle Scholar
  10. E. Grochowski and M. Annavaram. Energy per Instruction Trends in Intel Microprocessors. Technology@Intel Magazine, March 2006.Google ScholarGoogle Scholar
  11. R. Hankins, G. Chinya, J. Collins, P. Wang, R. Rakvic, H. Wang, and J. Shen. Multiple Instruction Stream Processor. In Proceedings of the 33rd International Symposium on Computer Architecture, June 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Intel G965 Express Chipset. http://www.intel.com/products/chipsets/g965/prod brief.pdf.Google ScholarGoogle Scholar
  13. Intel Santa Rosa Platform. http://www.intel.com/pressroom/archive/releases/20060307corp b.htm.Google ScholarGoogle Scholar
  14. Tera-scale Research Prototype: Connecting 80 Simple Sores on a Single Test Chip. ftp://download.intel.com/research/platform/terascale/tera-scaleresearchprototypebackgrounder.pdf.Google ScholarGoogle Scholar
  15. Intels Next Generation Integrated Graphics Architecture Intel Graphics Media Accelerator X3000 and 3000. Intel Corporation, 2006.Google ScholarGoogle Scholar
  16. U. Kapasi, S. Rixner, W. Dally, B. Khailany, J. Ahn, P. Mattson, and J. Owens. Programmable Stream Processors. IEEE Computer, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. Kumar, D. Tullsen, P. Ranganathan, N. Jouppi, and K. Farkas. Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance. In Proceedings of the 31st International Symposium on Computer Architecture, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. F. Labonte, P. Mattson, W. Thies, I. Buck, C. Kozyrakis, and M. Horowitz. The Stream Virtual Machine. In Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. W. Mark, R. Glanville, K. Akeley, and M. Kilgard. Cg: A System for Programming Graphics Hardware in a C-like Language. ACM Transactions on Graphics, (3):896--907, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. McCool and S. Toit. Metaprogramming GPUs with Sh. A K Peters, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. McCool, K. Wadleigh, B. Henderson, and H. Y. Lin. Performance Evaluation of GPUs using the RapidMind Development Platform. In Proceedings of the 20th International Conference on Supercomputing, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Krüger, A. Lefohn, and T. Purcell. A Survey of General-Purpose Computation on Graphics Hardware. In Eurographics, August 2005.Google ScholarGoogle Scholar
  23. The PeakStream Platform: High Productivity Software Development for Multi-core Processors. PeakStream Inc, 2006.Google ScholarGoogle Scholar
  24. M. Segal and M. Peercy. A Performance-Oriented Data Parallel Virtual Machine for GPUs. In SIGGRAPH, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Shah, G. Haab, P. Petersen, and J. Throop. Flexible control structures for parallelism in OpenMP. In First European Workshop on OpenMP, September 1999.Google ScholarGoogle Scholar
  26. E. Su, X. Tian ,M. Girkar, G. Haab, S. Shah, and P. Petersen. Compiler Support of the Workqueuing Execution Model for Intel SMP Architectures. In Proceedings of the 4th European Workshop on OpenMP, 2002.Google ScholarGoogle Scholar
  27. D. Tarditi, S. Puri, and J. Oglesby. Accelerator: Using Data Parallelism to Program GPUs for General-Purpose Uses. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, October 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. W. Thies,M. Karczmarek, and S. Amarasinghe. StreamIt: A Language for Streaming Applications. In Computational Complexity, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. X. Tian, A. Bik, M. Girkar, P. Grey, H. Saito, and E. Su. Intel OpenMP C++/Fortran Compiler for Hyper--Threading Technology: Implementation and Performance. Intel Technology Journal, Q1 2002.Google ScholarGoogle Scholar
  30. X. Tian, M. Girkar, S. Shah, D. Armstrong, E. Su, and P. Petersen. Compiler and Runtime Support for Running OpenMP Programs on Pentium and Itanium Architectures. In Proceedings of the 17th International Symposium on Parallel and Distributed Processing, April 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. O. Wechsler. Inside Intel Core Microarchitecture: Setting New Standards for Energy-efficient Performance. Technology@Intel Magazine, 2006.Google ScholarGoogle Scholar
  32. D. Zhang, Z. Li, H. Song, and L. Liu. A Programming Model for an Embedded Media Processing Architecture. In Embedded Computer Systems: Architecture, Modeling, and Simulation, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. EXOCHI: architecture and programming environment for a heterogeneous multi-core multithreaded system

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 42, Issue 6
        Proceedings of the 2007 PLDI conference
        June 2007
        491 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/1273442
        Issue’s Table of Contents
        • cover image ACM Conferences
          PLDI '07: Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation
          June 2007
          508 pages
          ISBN:9781595936332
          DOI:10.1145/1250734

        Copyright © 2007 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 10 June 2007

        Check for updates

        Qualifiers

        • article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader