research-article

Conservation cores: reducing the energy of mature computations

Authors:
Ganesh Venkatesh

University of California, San Diego, San Diego, USA

University of California, San Diego, San Diego, USA
View Profile

,
Jack Sampson

University of California, San Diego, San Diego, USA

University of California, San Diego, San Diego, USA
View Profile

,
Nathan Goulding

University of California, San Diego, San Diego, USA

University of California, San Diego, San Diego, USA
View Profile

,
Saturnino Garcia

University of California, San Diego, San Diego, USA

University of California, San Diego, San Diego, USA
View Profile

,
Vladyslav Bryksin

University of California, San Diego, San Diego, USA

University of California, San Diego, San Diego, USA
View Profile

,
Jose Lugo-Martinez

University of California, San Diego, San Diego, USA

University of California, San Diego, San Diego, USA
View Profile

,
Steven Swanson

University of California, San Diego, San Diego, USA

University of California, San Diego, San Diego, USA
View Profile

,
Michael Bedford Taylor

University of California, San Diego, San Diego, USA

University of California, San Diego, San Diego, USA
View Profile

ASPLOS XV: Proceedings of the fifteenth International Conference on Architectural support for programming languages and operating systemsMarch 2010Pages 205–218https://doi.org/10.1145/1736020.1736044

Published:13 March 2010Publication History

ASPLOS XV: Proceedings of the fifteenth International Conference on Architectural support for programming languages and operating systems

Pages 205–218

ABSTRACT

Growing transistor counts, limited power budgets, and the breakdown of voltage scaling are currently conspiring to create a utilization wall that limits the fraction of a chip that can run at full speed at one time. In this regime, specialized, energy-efficient processors can increase parallelism by reducing the per-computation power requirements and allowing more computations to execute under the same power budget. To pursue this goal, this paper introduces conservation cores. Conservation cores, or c-cores, are specialized processors that focus on reducing energy and energy-delay instead of increasing performance. This focus on energy makes c-cores an excellent match for many applications that would be poor candidates for hardware acceleration (e.g., irregular integer codes). We present a toolchain for automatically synthesizing c-cores from application source code and demonstrate that they can significantly reduce energy and energy-delay for a wide range of applications. The c-cores support patching, a form of targeted reconfigurability, that allows them to adapt to new versions of the software they target. Our results show that conservation cores can reduce energy consumption by up to 16.0x for functions and by up to 2.1x for whole applications, while patching can extend the useful lifetime of individual c-cores to match that of conventional processors.

References

S. Aditya, B. R. Rau, and V. Kathail. Automatic architectural synthesis of VLIW and EPIC processors. In ISSS '99: Proceedings of the 12th international symposium on System synthesis, page 107. IEEE Computer Society, 1999. Google ScholarDigital Library
Ageia Technologies. PhysX by Ageia. http://www.ageia.com/pdf/ds\_product\_overview.pdf.Google Scholar
J. H. Ahn, W. J. Dally, B. Khailany, U. J. Kapasi, and A. Das. Evaluating the Imagine Stream Architecture. In ISCA'04: Proceedings of the 31st Annual International Symposium on Computer Architecture, pages 14--25. IEEE Computer Society, 2004. Google ScholarDigital Library
ATI website. http://www.ati.com.Google Scholar
S. Balakrishnan, R. Rajwar, M. Upton, and K. Lai. The impact of performance asymmetry in emerging multicore architectures. In ISCA '05: Proceedings of the 32nd annual international symposium on Computer Architecture, pages 506--517, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarDigital Library
N. Clark, J. Blome, M. Chu, S. Mahlke, S. Biles, and K. Flautner. An architecture framework for transparent instruction set customization in embedded processors. In ISCA'05: Proceedings of the 32nd Annual International Symposium on Computer Architecture, pages 272--283. IEEE Computer Society, 2005. Google ScholarDigital Library
N. Clark, H. Zhong, K. Fan, S. Mahlke, K. Flautner,, and K. V. Nieuwenhove. OptimoDE: Programmable accelerator engines through retargetable customization. In HotChips, 2004.Google Scholar
CodeSurfer by GrammaTech, Inc. http://www.grammatech.com/products/codesurfer/.Google Scholar
R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck. An efficient method of computing static single assignment form. In POPL '89: Proceedings of the 16th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 25--35. ACM Press, 1989. Google ScholarDigital Library
W. J. Dally, F. Labonte, A. Das, P. Hanrahan, J.-H. Ahn, J. Gummaraju, M. Erez, N. Jayasena, I. Buck, T. J. Knight, and U. J. Kapasi. Merrimac: Supercomputing with streams. In SC '03: Proceedings of the 2003 ACM/IEEE conference on Supercomputing, page 35. IEEE Computer Society, 2003. Google ScholarDigital Library
R. Dennard, F. H. Gaensslen, V. L. Rideout, E. Bassous, and A. R. LeBlanc. Design of Ion-Implanted MOSFET's with Very Small Physical Dimensions. In IEEE Journal of Solid-State Circuits, October 1974.Google ScholarCross Ref
C. Ebeling, D. C. Cronquist, and P. Franklin. RaPiD -- reconfigurable pipelined datapath. In FPL'96: Proceedings of the 6th International Workshop on Field-Programmable Logic, Smart Applications, New Paradigms and Compilers, pages 126--135. Springer-Verlag, 1996. Google ScholarDigital Library
P. W. et al. Exochi: architecture and programming environment for a heterogeneous multi-core multithreaded system. In PLDI '07: Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation, pages 156--166, New York, NY, USA, 2007. ACM Press. Google ScholarDigital Library
R. K. et al. Core architecture optimization for heterogeneous chip multiprocessors. In PACT'06: Proceedings of the 15th international conference on Parallel architectures and compilation techniques, pages 23--32, New York, NY, USA, 2006. ACM Press. Google ScholarDigital Library
K. Fan, M. Kudlur, G. Dasika, and S. Mahlke. Bridging the computation gap between programmable processors and hardwired accelerators. In HPCA: High Performance Computer Architecture., pages 313--322, Feb. 2009.Google ScholarCross Ref
S. C. Goldstein, H. Schmit, M. Moe, M. Budiu, S. Cadambi, R. R. Taylor, and R. Laufer. PipeRench: A Coprocessor for Streaming Multimedia Acceleration. In ISCA'99: Proceedings of the 26th Annual International Symposium on Computer Architecture, pages 28--39. IEEE Computer Society, 1999. Google ScholarDigital Library
E. Grochowski, R. Ronen, J. Shen, and H. Wang. Best of both latency and throughput. In ICCD'04: Proceedings of the IEEE International Conference on Computer Design (ICCD'04), pages 236--243, Washington, DC, USA, 2004. IEEE Computer Society. Google ScholarDigital Library
J. R. Hauser and J. Wawrzynek. Garp: A MIPS Processor with a Reconfigurable Coprocessor. In K. L. Pocek and J. Arnold, editors, FCCM'97: IEEE Symposium on FPGAs for Custom Computing Machines, pages 12--21. IEEE Computer Society Press, 1997. Google ScholarDigital Library
M. Horowitz, E. Alon, D. Patil, S. Naffziger, R. Kumar, and K. Bernstein. Scaling, Power, and the Future of CMOS. In IEDM'05: IEEE International Electron Devices Meeting, 2005.Google ScholarCross Ref
J. Kahle. The CELL processor architecture. In MICRO 38: Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, page 3. IEEE Computer Society, 2005. Google ScholarDigital Library
J. S. Kim, M. B. Taylor, J. Miller, and D. Wentzlaff. Energy characterization of a tiled architecture processor with on-chip networks. In International Symposium on Low Power Electronics and Design, San Diego, CA, USA, August 2003. Google ScholarDigital Library
R. Kumar, D. M. Tullsen, P. Ranganathan, N. P. Jouppi, and K. I. Farkas. Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance. In ISCA'04: Proceedings of the 31st Annual International Symposium on Computer Architecture, page 64. IEEE Computer Society, 2004. Google ScholarDigital Library
C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In CGO'04: Proceedings of the international symposium on Code generation and optimization, page 75. IEEE Computer Society, 2004. Google ScholarDigital Library
J. Li and J. F. Martínez. Power-performance considerations of parallel computing on chip multiprocessors. ACM Trans. Archit. Code Optim., 2(4):397--422, 2005. Google ScholarDigital Library
MAP-CA datasheet, June 2001. Equator Technologies.Google Scholar
MIPS Technologies. MIPS Technologies product page. http://www.mips.com/products/processors/32-64-bit-cores/mips32--24ke, 2008--2009.Google Scholar
M. Mishra, T. J. Callahan, T. Chelcea, G. Venkataramani, S. C. Goldstein, and M. Budiu. Tartan: evaluating spatial computation for whole program execution. SIGOPS Oper. Syst. Rev., 40(5):163--174, 2006. Google ScholarDigital Library
nVidia website. http://www.nvidia.com.Google Scholar
OpenImpact Website. http://gelato.uiuc.edu/.Google Scholar
J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Krüger, A. E. Lefohn,, and T. J. Purcell. A survey of general-purpose computation on graphics hardware. In Eurographics 2005, State of the Art Reports, pages 21--51, August 2005.Google Scholar
D. Patterson, T. Anderson, N. Cardwell, R. Fromm, K. Keeton, C. Kozyrakis, R. Thomas, and K. Yelick. A case for intelligent RAM. IEEE Micro, 17(2):34--44, April 1997. Google ScholarDigital Library
TM1000 preliminary data book, 1997. http://www.semiconductors.philips.com/acrobat/other/tm1000.pdf.Google Scholar
R. Razdan and M. D. Smith. A high-performance microarchitecture with hardware-programmable functional units. In MICRO 27: Proceedings of the 27th annual international symposium on Microarchitecture, pages 172--180. ACM Press, 1994. Google ScholarDigital Library
L. Strozek and D. Brooks. Efficient architectures through application clustering and architectural heterogeneity. In CASES '06: Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems, pages 190--200, New York, NY, USA, 2006. ACM Press. Google ScholarDigital Library
S. Swanson, A. Schwerin, M. Mercaldi, A. Petersen, A. Putnam, K. Michelson, M. Oskin, and S. J. Eggers. The wavescalar architecture. ACM Trans. Comput. Syst., 25(2):4, 2007. Google ScholarDigital Library
M. B. Taylor, W. Lee, J. Miller, D. Wentzlaff, I. Bratt, B. Greenwald, H. Hoffmann, P. Johnson, J. Kim, J. Psota, A. Saraf, N. Shnidman, V. Strumpen, M. Frank, S. Amarasinghe, and A. Agarwal. Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams. In ISCA '04: Proceedings of the 31st annual International Symposium on Computer Architecture, page 2. IEEE Computer Society, 2004. Google ScholarDigital Library
S. Thoziyoor, N. Muralimanohar, J. H. Ahn, and N. P. Jouppi. Cacti 5.1. Technical Report HPL-2008-20, HP Labs, Palo Alto, 2008.Google Scholar
A. Wang, E. Killian, D. Maydan, and C. Rowen. Hardware/software instruction set configurability for system-on-chip processors. In DAC'01: Proceedings of the 38th conference on Design automation, pages 184--188. ACM Press, 2001. Google ScholarDigital Library
L. Wu, C. Weaver, and T. Austin. Cryptomaniac: A fast flexible architecture for secure communication. In ISCA'01: Proceedings of the 28th Annual International Symposium on Computer Architecture, pages 110--119. ACM Press, 2001. Google ScholarDigital Library
Z. A. Ye, A. Moshovos, S. Hauck, and P. Banerjee. CHIMAERA: A High-Performance Architecture with a Tightly-Coupled Reconfigurable Functional Unit. In ISCA'00: Proceedings of the 27th Annual International Symposium on Computer Architecture, pages 225--235. ACM Press, 2000. Google ScholarDigital Library
S. Yehia, S. Girbal, H. Berry, and O. Temam. Reconciling specialization and flexibility through compound circuits. In HPCA 15: High Performance Computer Architecture, pages 277--288, Feb. 2009.Google Scholar

Index Terms

Conservation cores: reducing the energy of mature computations
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Heterogeneous (hybrid) systems

Recommendations

QsCores: trading dark silicon for scalable energy efficiency with quasi-specific cores
MICRO-44: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture

Transistor density continues to increase exponentially, but power dissipation per transistor is improving only slightly with each generation of Moore's law. Given the constant chip-level power budgets, this exponentially decreases the percentage of ...
Read More
Conservation cores: reducing the energy of mature computations
ASPLOS '10

Growing transistor counts, limited power budgets, and the breakdown of voltage scaling are currently conspiring to create a utilization wall that limits the fraction of a chip that can run at full speed at one time. In this regime, specialized, energy-...
Read More
Conservation cores: reducing the energy of mature computations
ASPLOS '10

Growing transistor counts, limited power budgets, and the breakdown of voltage scaling are currently conspiring to create a utilization wall that limits the fraction of a chip that can run at full speed at one time. In this regime, specialized, energy-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ASPLOS XV: Proceedings of the fifteenth International Conference on Architectural support for programming languages and operating systems
March 2010
422 pages
ISBN:9781605588391
DOI:10.1145/1736020
General Chair:
James C. Hoe
Carnegie Mellon University, USA
,
Program Chair:
Vikram S. Adve
University of Illinois at Urbana-Champaign, USA
ACM SIGPLAN Notices Volume 45, Issue 3
ASPLOS '10
March 2010
399 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1735971
Issue’s Table of Contents
ACM SIGARCH Computer Architecture News Volume 38, Issue 1
ASPLOS '10
March 2010
399 pages
ISSN:0163-5964
DOI:10.1145/1735970
Issue’s Table of Contents
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 March 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
conservation core
heterogeneous many-core
patching
utilization wall
Qualifiers
- research-article
Conference

Acceptance Rates
ASPLOS XV Paper Acceptance Rate32of181submissions,18%Overall Acceptance Rate535of2,713submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 409
  Total Citations
  View Citations
- 2,711
  Total Downloads
- Downloads (Last 12 months)151
- Downloads (Last 6 weeks)15
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Conservation cores: reducing the energy of mature computations

ASPLOS XV: Proceedings of the fifteenth International Conference on Architectural support for programming languages and operating systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

QsCores: trading dark silicon for scalable energy efficiency with quasi-specific cores

Conservation cores: reducing the energy of mature computations

Conservation cores: reducing the energy of mature computations