article

EXOCHI: architecture and programming environment for a heterogeneous multi-core multithreaded system

Authors:
Perry H. Wang

Intel, Santa Clara, CA

Intel, Santa Clara, CA
View Profile

,
Jamison D. Collins

Intel, Santa Clara, CA

Intel, Santa Clara, CA
View Profile

,
Gautham N. Chinya

Intel, Hillsboro, OR

Intel, Hillsboro, OR
View Profile

,
Hong Jiang

Intel, Folsom, CA

Intel, Folsom, CA
View Profile

,
Xinmin Tian

Intel, Santa Clara, CA

Intel, Santa Clara, CA
View Profile

,
Milind Girkar

Intel, Santa Clara, CA

Intel, Santa Clara, CA
View Profile

,
Nick Y. Yang

Intel, Folsom, CA

Intel, Folsom, CA
View Profile

,
Guei-Yuan Lueh

Intel, Santa Clara, CA

Intel, Santa Clara, CA
View Profile

,
Hong Wang

Intel, Santa Clara, CA

Intel, Santa Clara, CA
View Profile

Authors Info & Claims

ACM SIGPLAN Notices Volume 42 Issue 6June 2007pp 156–166https://doi.org/10.1145/1273442.1250753

Published:10 June 2007Publication History

ACM SIGPLAN Notices

Abstract

Future mainstream microprocessors will likely integrate specialized accelerators, such as GPUs, onto a single die to achieve better performance and power efficiency. However, it remains a keen challenge to program such a heterogeneous multicore platform, since these specialized accelerators feature ISAs and functionality that are significantly different from the general purpose CPU cores. In this paper, we present EXOCHI: (1) Exoskeleton Sequencer(EXO), an architecture to represent heterogeneous acceleratorsas ISA-based MIMD architecture resources, and a shared virtual memory heterogeneous multithreaded program execution model that tightly couples specialized accelerator cores with generalpurpose CPU cores, and (2) C for Heterogeneous Integration(CHI), an integrated C/C++ programming environment that supports accelerator-specific inline assembly and domain-specific languages. The CHI compiler extends the OpenMP pragma for heterogeneous multithreading programming, and produces a single fat binary with code sections corresponding to different instruction sets. The runtime can judiciously spread parallel computation across the heterogeneous cores to optimize performance and power.

We have prototyped the EXO architecture on a physical heterogeneous platform consisting of an Intel® Core™ 2 Duo processor and an 8-core 32-thread Intel® Graphics Media Accelerator X3000. In addition, we have implemented the CHI integrated programming environment with the Intel® C++ Compiler, runtime toolset, and debugger. On the EXO prototype system, we have enhanced a suite of production-quality media kernels for video and image processing to utilize the accelerator through the CHI programming interface, achieving significant speedup (1.41X to10.97X) over execution on the IA32 CPU alone.

References

I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan. Brook for GPUs: Stream Computing on Graphics Hardware. ACM Transactions on Graphics, 23(3):777--786, 2004. Google ScholarDigital Library
CPU+GPU integration. http://www.google.com/search?hl=en&lr=&rls=GGLG%2CGGLG%2005--47%2CGGLG3Aen&q=intel+amd+nvidia+ati+cpu+gpu+integrated+&btnG=Search.Google Scholar
CUDA. http://developer.nvidia.com/object/cuda.html.Google Scholar
P. Dubey. Recognition, Mining and Synthesis Moves Computers to the Era of Tera. Technology@Intel Magazine, February 2005.Google Scholar
A. Eichenberger, K. O'Brien, K. O'Brien, P. Wu, T. Chen, P. Oden, D. Prener, J. Shepherd, B. So, Z. Sura, A. Wang, T. Zhang, P. Zhao, and M. Gschwind. Optimizing Compiler for the CELL Processor. In Proceedings of the 14th international Conference on Parallel Architectures and Compilation Techniques, 2005. Google ScholarDigital Library
GLSL OpenGL Shading Language. www.wikipedia.org/wiki/GLSL.Google Scholar
R. Gonzalez. A Software-configurable Processor Architecture. IEEE Micro, pages 42--51, Sept-Oct 2006. Google ScholarDigital Library
N. Govindaraju, S. Larsen, J. Gray, and D.Manocha. AMemory Model for Scientific Algorithms on Graphics Processor. In IEEE Supercomputing, 2006. Google ScholarDigital Library
GPGPU: General Purpose Computation using Graphics Hardware. www.gpgpu.org.Google Scholar
E. Grochowski and M. Annavaram. Energy per Instruction Trends in Intel Microprocessors. Technology@Intel Magazine, March 2006.Google Scholar
R. Hankins, G. Chinya, J. Collins, P. Wang, R. Rakvic, H. Wang, and J. Shen. Multiple Instruction Stream Processor. In Proceedings of the 33rd International Symposium on Computer Architecture, June 2006. Google ScholarDigital Library
Intel G965 Express Chipset. http://www.intel.com/products/chipsets/g965/prod brief.pdf.Google Scholar
Intel Santa Rosa Platform. http://www.intel.com/pressroom/archive/releases/20060307corp b.htm.Google Scholar
Tera-scale Research Prototype: Connecting 80 Simple Sores on a Single Test Chip. ftp://download.intel.com/research/platform/terascale/tera-scaleresearchprototypebackgrounder.pdf.Google Scholar
Intels Next Generation Integrated Graphics Architecture Intel Graphics Media Accelerator X3000 and 3000. Intel Corporation, 2006.Google Scholar
U. Kapasi, S. Rixner, W. Dally, B. Khailany, J. Ahn, P. Mattson, and J. Owens. Programmable Stream Processors. IEEE Computer, 2003. Google ScholarDigital Library
R. Kumar, D. Tullsen, P. Ranganathan, N. Jouppi, and K. Farkas. Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance. In Proceedings of the 31st International Symposium on Computer Architecture, June 2004. Google ScholarDigital Library
F. Labonte, P. Mattson, W. Thies, I. Buck, C. Kozyrakis, and M. Horowitz. The Stream Virtual Machine. In Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, 2004. Google ScholarDigital Library
W. Mark, R. Glanville, K. Akeley, and M. Kilgard. Cg: A System for Programming Graphics Hardware in a C-like Language. ACM Transactions on Graphics, (3):896--907, 2003. Google ScholarDigital Library
M. McCool and S. Toit. Metaprogramming GPUs with Sh. A K Peters, 2004. Google ScholarDigital Library
M. McCool, K. Wadleigh, B. Henderson, and H. Y. Lin. Performance Evaluation of GPUs using the RapidMind Development Platform. In Proceedings of the 20th International Conference on Supercomputing, 2006. Google ScholarDigital Library
J. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Krüger, A. Lefohn, and T. Purcell. A Survey of General-Purpose Computation on Graphics Hardware. In Eurographics, August 2005.Google Scholar
The PeakStream Platform: High Productivity Software Development for Multi-core Processors. PeakStream Inc, 2006.Google Scholar
M. Segal and M. Peercy. A Performance-Oriented Data Parallel Virtual Machine for GPUs. In SIGGRAPH, 2006. Google ScholarDigital Library
S. Shah, G. Haab, P. Petersen, and J. Throop. Flexible control structures for parallelism in OpenMP. In First European Workshop on OpenMP, September 1999.Google Scholar
E. Su, X. Tian ,M. Girkar, G. Haab, S. Shah, and P. Petersen. Compiler Support of the Workqueuing Execution Model for Intel SMP Architectures. In Proceedings of the 4th European Workshop on OpenMP, 2002.Google Scholar
D. Tarditi, S. Puri, and J. Oglesby. Accelerator: Using Data Parallelism to Program GPUs for General-Purpose Uses. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, October 2006. Google ScholarDigital Library
W. Thies,M. Karczmarek, and S. Amarasinghe. StreamIt: A Language for Streaming Applications. In Computational Complexity, 2002. Google ScholarDigital Library
X. Tian, A. Bik, M. Girkar, P. Grey, H. Saito, and E. Su. Intel OpenMP C++/Fortran Compiler for Hyper--Threading Technology: Implementation and Performance. Intel Technology Journal, Q1 2002.Google Scholar
X. Tian, M. Girkar, S. Shah, D. Armstrong, E. Su, and P. Petersen. Compiler and Runtime Support for Running OpenMP Programs on Pentium and Itanium Architectures. In Proceedings of the 17th International Symposium on Parallel and Distributed Processing, April 2003. Google ScholarDigital Library
O. Wechsler. Inside Intel Core Microarchitecture: Setting New Standards for Energy-efficient Performance. Technology@Intel Magazine, 2006.Google Scholar
D. Zhang, Z. Li, H. Song, and L. Liu. A Programming Model for an Embedded Media Processing Architecture. In Embedded Computer Systems: Architecture, Modeling, and Simulation, 2005. Google ScholarDigital Library

Index Terms

EXOCHI: architecture and programming environment for a heterogeneous multi-core multithreaded system
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
2. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

EXOCHI: architecture and programming environment for a heterogeneous multi-core multithreaded system
PLDI '07: Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation

Future mainstream microprocessors will likely integrate specialized accelerators, such as GPUs, onto a single die to achieve better performance and power efficiency. However, it remains a keen challenge to program such a heterogeneous multicore platform,...
Read More
A performance study of general-purpose applications on graphics processors using CUDA

Graphics processors (GPUs) provide a vast number of simple, data-parallel, deeply multithreaded cores and high memory bandwidths. GPU architectures are becoming increasingly programmable, offering the potential for dramatic speedups for a variety of ...
Read More
Vectorized algorithm for multidimensional Monte Carlo integration on modern GPU, CPU and MIC architectures

The aim of this paper is to show that the multidimensional Monte Carlo integration can be efficiently implemented on computers with modern multicore CPUs and manycore accelerators including Intel MIC and GPU architectures using a new vectorized version ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGPLAN Notices Volume 42, Issue 6
Proceedings of the 2007 PLDI conference
June 2007
491 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1273442
Issue’s Table of Contents
PLDI '07: Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2007
508 pages
ISBN:9781595936332
DOI:10.1145/1250734
General Chair:
Jeanne Ferrante
University of California, San Diego, USA
,
Program Chair:
Kathryn S. McKinley
University of Texas at Austin, USA
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 June 2007
Check for updates
Author Tags
GPU
heterogeneous multi-cores
openMP
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 114
  Total Citations
  View Citations
- 4,260
  Total Downloads
- Downloads (Last 12 months)27
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

EXOCHI: architecture and programming environment for a heterogeneous multi-core multithreaded system

ACM SIGPLAN Notices

Abstract

References

Cited By

Index Terms

Recommendations

EXOCHI: architecture and programming environment for a heterogeneous multi-core multithreaded system

A performance study of general-purpose applications on graphics processors using CUDA

Vectorized algorithm for multidimensional Monte Carlo integration on modern GPU, CPU and MIC architectures