research-article

Balancing memory and performance through selective flushing of software code caches

Authors:
Apala Guha

University of Virginia, Charlottesville, Virginia, USA

University of Virginia, Charlottesville, Virginia, USA
View Profile

,
Kim Hazelwood

University of Virginia, Charlottesville, Virginia, USA

University of Virginia, Charlottesville, Virginia, USA
View Profile

,
Mary Soffa

University of Virginia, Charlottesville, Virginia, USA

University of Virginia, Charlottesville, Virginia, USA
View Profile

CASES '10: Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systemsOctober 2010Pages 1–10https://doi.org/10.1145/1878921.1878923

Published:24 October 2010Publication History

CASES '10: Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems

Pages 1–10

ABSTRACT

Dynamic binary translators (DBTs) are becoming increasingly important because of their power and flexibility. However, the high memory demands of DBTs present an obstacle for all platforms, and especially embedded systems. The memory demand is typically controlled by placing a limit on cached translations and forcing the DBT to flush all translations upon reaching the limit. This solution manifests as a performance inefficiency because many flushed translations require retranslation. Ideally, translations should be selectively flushed to minimize retranslations for a given memory limit. However, three obstacles exist:(1) it is difficult to predict which selections will minimize retranslation,(2) selective flushing results in greater book-keeping overheads than full flushing, and(3) the emergence of multicore processors and multi-threaded programming complicates most flushing algorithms. These issues have led to the widespread adoption of full flushing as a standard protocol. In this paper, we present a partial flushing approach aimed at reducing retranslation overhead and improving overall performance, given a fixed memory budget. Our technique applies uniformly to single-threaded and multi-threaded guest applications

References

J. Baiocchi, B. R. Childers, J. W. Davidson, J. D. Hiser, and J. Misurda. Fragment cache management for dynamic binary translators in embedded systems with scratchpad. In Compilers, Architecture, and Synthesis for Embedded Systems, pages 75--84, Salzburg, Austria, 2007. Google ScholarDigital Library
J. A. Baiocchi and B. R. Childers. Heterogeneous code cache: using scratchpad and main memory in dynamic binary translators. In 46th Annual Design Automation Conference, pages 744--749, San Francisco, CA, 2009. Google ScholarDigital Library
J. A. Baiocchi, B. R. Childers, J. W. Davidson, and J. D. Hiser. Reducing pressure in bounded DBT code caches. In Compilers, Architectures and Synthesis for Embedded Systems, pages 109--118, Atlanta, GA, 2008. Google ScholarDigital Library
V. Bala, E. Duesterwald, and S. Banerjia. Dynamo: a transparent dynamic optimization system. In Programming Language Design and Implementation, pages 1--12, Vancouver, BC, Canada, 2000. Google ScholarDigital Library
C. Bienia, S. Kumar, J. P. Singh, and K. Li. The parsec benchmark suite: Characterization and architectural implications. In Parallel Architectures and Compilation Techniques, October 2008. Google ScholarDigital Library
D. Bruening and S. Amarasinghe. Maintaining consistency and bounding capacity of software code caches. In Code Generation and Optimization, pages 74--85, San Jose, CA, 2005. Google ScholarDigital Library
D. Bruening, T. Garnett, and S. Amarasinghe. An infrastructure for adaptive dynamic optimization. In Code Generation and Optimization, pages 265--275, San Francisco, CA, 2003. Google ScholarDigital Library
D. Bruening, V. Kiriansky, T. Garnett, and S. Banerji. Thread-shared software code caches. In Code Generation and Optimization, pages 28--38, New York, NY, March 2006. Google ScholarDigital Library
G. Desoli, N. Mateev, E. Duesterwald, P. Faraboschi, and J. A. Fisher. Deli: a new run-time control point. In 35th Int'l Symp. on Microarchitecture, pages 257--268, Istanbul, Turkey, 2002. Google ScholarDigital Library
A. Guha, K. Hazelwood, and M. L. Soffa. Reducing exit stub memory consumption in code caches. In High-Performance Embedded Architectures and Compilers (HiPEAC), pages 87--101, Ghent, Belgium, January 2007. Google ScholarDigital Library
A. Guha, K. Hazelwood, and M. L. Soffa. Code lifetime based memory reduction for virtual execution environments. In 6th Workshop on Optimizations for DSP and Embedded Systems (ODES), Boston, MA, March 2008.Google Scholar
A. Guha, K. Hazelwood, and M. L. Soffa. DBT path selection for holistic memory efficiency and performance. In Virtual Execution Environments, pages 145--156, Pittsburgh, PA, 2010. Google ScholarDigital Library
M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. Mibench: A free, commercially representative embedded benchmark suite. In Workshop on Workload Characterization, pages 3--14, 2001. Google ScholarDigital Library
K. Hazelwood and A. Klauser. A dynamic binary instrumentation engine for the ARM architecture. In Compilers, Architecture, and Synthesis for Embedded Systems, pages 261--270, Seoul, Korea, 2006. Google ScholarDigital Library
K. Hazelwood, G. Lueck, and R. Cohn. Scalable support for multithreaded applications on dynamic binary instrumentation systems. In International Symposium on Memory Management, pages 20--29, Dublin, Ireland, 2009. Google ScholarDigital Library
K. Hazelwood and M. D. Smith. Managing bounded code caches in dynamic binary optimization systems. Transactions on Code Generation and Optimization, 3(3):263--294, September 2006. Google ScholarDigital Library
J. L. Henning. Spec cpu2000: Measuring CPU performance in the new millennium. Computer, 2000. Google ScholarDigital Library
J. D. Hiser, D. Williams, A. Filipi, J. W. Davidson, and B. R. Childers. Evaluating fragment construction policies for SDT systems. In Virtual Execution Environments, pages 122--132, Ottawa, Canada, 2006. Google ScholarDigital Library
V. Janapareddi, D. Connors, R. Cohn, and M. D. Smith. Persistent code caching: Exploiting code reuse across executions and applications. In Code Generation and Optimization, pages 74--88, San Jose, CA, 2007. Google ScholarDigital Library
V. Kiriansky, D. Bruening, and S. Amarasinghe. Secure execution via program shepherding. In 11th USENIX Security Symposium, pages 191--206, San Francisco, CA, 2002. Google ScholarDigital Library
C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. Janapareddi, and K. Hazelwood. Pin: Building customized program analysis tools with dynamic instrumentation. In Programming Language Design and Implementation, pages 190--200, Chicago, IL, June 2005. Google ScholarDigital Library
R. W. Moore, J. A. Baiocchi, B. R. Childers, J. W. Davidson, and J. D. Hiser. Addressing the challenges of DBT for the ARM architecture. In Languages, Compilers, and Tools for Embedded Systems, pages 147--156, Dublin, Ireland, 2009. Google ScholarDigital Library
N. Nethercote and J. Seward. Valgrind: a framework for heavyweight dynamic binary instrumentation. In Programming Language Design and Implementation, pages 89--100, San Diego, CA, 2007. Google ScholarDigital Library
J. Palm, H. Lee, A. Diwan, and J. E. B. Moss. When to use a compilation service? In Languages, Compilers, and Tools for Embedded Systems, Berlin, Germany, 2002. Google ScholarDigital Library
K. Scott, N. Kumar, S. Velusamy, B. Childers, J. Davidson, and M. L. Soffa. Reconfigurable and retargetable software dynamic translation. In Code Generation and Optimization, pages 36--47, San Francisco, CA, March 2003. Google ScholarDigital Library
S. Shogan and B. R. Childers. Compact binaries with code compression in a software dynamic translator. In Design, Automation and Test in Europe, page 21052, Paris, France, 2004. Google ScholarDigital Library
Q. Wu, M. Martonosi, D. W. Clark, V. Janapareddi, D. Connors, Y. Wu, J. Lee, and D. Brooks. A dynamic compilation framework for controlling microprocessor energy and performance. In 38th Int'l Symp. on Microarchitecture, pages 271--282, Barcelona, Spain, 2005. Google ScholarDigital Library
L. Zhang and C. Krintz. Adaptive unloading for resource-constrained VMs. In Languages, Compilers, and Tools for Embedded Systems, Washington, DC, 2004. Google ScholarDigital Library
S. Zhou, B. R. Childers, and M. L. Soffa. Planning for code buffer management in distributed virtual execution environments. In Virtual Execution Environments, pages 100--109, Chicago, IL, 2005. Google ScholarDigital Library

Index Terms

Balancing memory and performance through selective flushing of software code caches
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Runtime environments
  2. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Memory management
        Garbage collection

Recommendations

SELECTIVE VICTIM CACHING: A METHOD TO IMPROVE THE PERFORMANCE OF DIRECT-MAPPED CACHES
Read More
Performance Analysis of Cache Coherence Protocols for Multi-core Architectures: A System Attribute Perspective
AICTC '16: Proceedings of the International Conference on Advances in Information Communication Technology & Computing

Shared memory multi-core processors are becoming dominant in todays computer architectures. Caching of shared data may produce a problem of replication in multiple caches. Replication provides reduction in contention for shared data items along with ...
Read More
Performance of One's Complement Caches

On-chip caches to reduce average memory access latency are commonplace in today's commercial microprocessors. These on-chip caches generally have low associativity and small cache sizes. Cache line conflicts are the main source of cache misses, which ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CASES '10: Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
October 2010
276 pages
ISBN:9781605589039
DOI:10.1145/1878921
Program Chairs:
Vinod Kathail
USA
,
Reid Tatge
Texas Instruments, USA
,
Rajeev Barua
University of Maryland, College Park, USA
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 October 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
code cache
dynamic binary translation
eviction
flushing
software dynamic translation
virtual execution environments
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate52of230submissions,23%
Upcoming Conference
ESWEEK '24

Sponsor:

sigbed

sigbed

sigbed

Twentieth Embedded Systems Week

September 29 - October 4, 2024

Raleigh , NC , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 10
  Total Citations
  View Citations
- 235
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Balancing memory and performance through selective flushing of software code caches

CASES '10: Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

SELECTIVE VICTIM CACHING: A METHOD TO IMPROVE THE PERFORMANCE OF DIRECT-MAPPED CACHES

Performance Analysis of Cache Coherence Protocols for Multi-core Architectures: A System Attribute Perspective

Performance of One's Complement Caches