ABSTRACT
Dynamic binary translators (DBTs) are becoming increasingly important because of their power and flexibility. However, the high memory demands of DBTs present an obstacle for all platforms, and especially embedded systems. The memory demand is typically controlled by placing a limit on cached translations and forcing the DBT to flush all translations upon reaching the limit. This solution manifests as a performance inefficiency because many flushed translations require retranslation. Ideally, translations should be selectively flushed to minimize retranslations for a given memory limit. However, three obstacles exist:(1) it is difficult to predict which selections will minimize retranslation,(2) selective flushing results in greater book-keeping overheads than full flushing, and(3) the emergence of multicore processors and multi-threaded programming complicates most flushing algorithms. These issues have led to the widespread adoption of full flushing as a standard protocol. In this paper, we present a partial flushing approach aimed at reducing retranslation overhead and improving overall performance, given a fixed memory budget. Our technique applies uniformly to single-threaded and multi-threaded guest applications
- J. Baiocchi, B. R. Childers, J. W. Davidson, J. D. Hiser, and J. Misurda. Fragment cache management for dynamic binary translators in embedded systems with scratchpad. In Compilers, Architecture, and Synthesis for Embedded Systems, pages 75--84, Salzburg, Austria, 2007. Google ScholarDigital Library
- J. A. Baiocchi and B. R. Childers. Heterogeneous code cache: using scratchpad and main memory in dynamic binary translators. In 46th Annual Design Automation Conference, pages 744--749, San Francisco, CA, 2009. Google ScholarDigital Library
- J. A. Baiocchi, B. R. Childers, J. W. Davidson, and J. D. Hiser. Reducing pressure in bounded DBT code caches. In Compilers, Architectures and Synthesis for Embedded Systems, pages 109--118, Atlanta, GA, 2008. Google ScholarDigital Library
- V. Bala, E. Duesterwald, and S. Banerjia. Dynamo: a transparent dynamic optimization system. In Programming Language Design and Implementation, pages 1--12, Vancouver, BC, Canada, 2000. Google ScholarDigital Library
- C. Bienia, S. Kumar, J. P. Singh, and K. Li. The parsec benchmark suite: Characterization and architectural implications. In Parallel Architectures and Compilation Techniques, October 2008. Google ScholarDigital Library
- D. Bruening and S. Amarasinghe. Maintaining consistency and bounding capacity of software code caches. In Code Generation and Optimization, pages 74--85, San Jose, CA, 2005. Google ScholarDigital Library
- D. Bruening, T. Garnett, and S. Amarasinghe. An infrastructure for adaptive dynamic optimization. In Code Generation and Optimization, pages 265--275, San Francisco, CA, 2003. Google ScholarDigital Library
- D. Bruening, V. Kiriansky, T. Garnett, and S. Banerji. Thread-shared software code caches. In Code Generation and Optimization, pages 28--38, New York, NY, March 2006. Google ScholarDigital Library
- G. Desoli, N. Mateev, E. Duesterwald, P. Faraboschi, and J. A. Fisher. Deli: a new run-time control point. In 35th Int'l Symp. on Microarchitecture, pages 257--268, Istanbul, Turkey, 2002. Google ScholarDigital Library
- A. Guha, K. Hazelwood, and M. L. Soffa. Reducing exit stub memory consumption in code caches. In High-Performance Embedded Architectures and Compilers (HiPEAC), pages 87--101, Ghent, Belgium, January 2007. Google ScholarDigital Library
- A. Guha, K. Hazelwood, and M. L. Soffa. Code lifetime based memory reduction for virtual execution environments. In 6th Workshop on Optimizations for DSP and Embedded Systems (ODES), Boston, MA, March 2008.Google Scholar
- A. Guha, K. Hazelwood, and M. L. Soffa. DBT path selection for holistic memory efficiency and performance. In Virtual Execution Environments, pages 145--156, Pittsburgh, PA, 2010. Google ScholarDigital Library
- M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. Mibench: A free, commercially representative embedded benchmark suite. In Workshop on Workload Characterization, pages 3--14, 2001. Google ScholarDigital Library
- K. Hazelwood and A. Klauser. A dynamic binary instrumentation engine for the ARM architecture. In Compilers, Architecture, and Synthesis for Embedded Systems, pages 261--270, Seoul, Korea, 2006. Google ScholarDigital Library
- K. Hazelwood, G. Lueck, and R. Cohn. Scalable support for multithreaded applications on dynamic binary instrumentation systems. In International Symposium on Memory Management, pages 20--29, Dublin, Ireland, 2009. Google ScholarDigital Library
- K. Hazelwood and M. D. Smith. Managing bounded code caches in dynamic binary optimization systems. Transactions on Code Generation and Optimization, 3(3):263--294, September 2006. Google ScholarDigital Library
- J. L. Henning. Spec cpu2000: Measuring CPU performance in the new millennium. Computer, 2000. Google ScholarDigital Library
- J. D. Hiser, D. Williams, A. Filipi, J. W. Davidson, and B. R. Childers. Evaluating fragment construction policies for SDT systems. In Virtual Execution Environments, pages 122--132, Ottawa, Canada, 2006. Google ScholarDigital Library
- V. Janapareddi, D. Connors, R. Cohn, and M. D. Smith. Persistent code caching: Exploiting code reuse across executions and applications. In Code Generation and Optimization, pages 74--88, San Jose, CA, 2007. Google ScholarDigital Library
- V. Kiriansky, D. Bruening, and S. Amarasinghe. Secure execution via program shepherding. In 11th USENIX Security Symposium, pages 191--206, San Francisco, CA, 2002. Google ScholarDigital Library
- C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. Janapareddi, and K. Hazelwood. Pin: Building customized program analysis tools with dynamic instrumentation. In Programming Language Design and Implementation, pages 190--200, Chicago, IL, June 2005. Google ScholarDigital Library
- R. W. Moore, J. A. Baiocchi, B. R. Childers, J. W. Davidson, and J. D. Hiser. Addressing the challenges of DBT for the ARM architecture. In Languages, Compilers, and Tools for Embedded Systems, pages 147--156, Dublin, Ireland, 2009. Google ScholarDigital Library
- N. Nethercote and J. Seward. Valgrind: a framework for heavyweight dynamic binary instrumentation. In Programming Language Design and Implementation, pages 89--100, San Diego, CA, 2007. Google ScholarDigital Library
- J. Palm, H. Lee, A. Diwan, and J. E. B. Moss. When to use a compilation service? In Languages, Compilers, and Tools for Embedded Systems, Berlin, Germany, 2002. Google ScholarDigital Library
- K. Scott, N. Kumar, S. Velusamy, B. Childers, J. Davidson, and M. L. Soffa. Reconfigurable and retargetable software dynamic translation. In Code Generation and Optimization, pages 36--47, San Francisco, CA, March 2003. Google ScholarDigital Library
- S. Shogan and B. R. Childers. Compact binaries with code compression in a software dynamic translator. In Design, Automation and Test in Europe, page 21052, Paris, France, 2004. Google ScholarDigital Library
- Q. Wu, M. Martonosi, D. W. Clark, V. Janapareddi, D. Connors, Y. Wu, J. Lee, and D. Brooks. A dynamic compilation framework for controlling microprocessor energy and performance. In 38th Int'l Symp. on Microarchitecture, pages 271--282, Barcelona, Spain, 2005. Google ScholarDigital Library
- L. Zhang and C. Krintz. Adaptive unloading for resource-constrained VMs. In Languages, Compilers, and Tools for Embedded Systems, Washington, DC, 2004. Google ScholarDigital Library
- S. Zhou, B. R. Childers, and M. L. Soffa. Planning for code buffer management in distributed virtual execution environments. In Virtual Execution Environments, pages 100--109, Chicago, IL, 2005. Google ScholarDigital Library
Index Terms
- Balancing memory and performance through selective flushing of software code caches
Recommendations
Performance Analysis of Cache Coherence Protocols for Multi-core Architectures: A System Attribute Perspective
AICTC '16: Proceedings of the International Conference on Advances in Information Communication Technology & ComputingShared memory multi-core processors are becoming dominant in todays computer architectures. Caching of shared data may produce a problem of replication in multiple caches. Replication provides reduction in contention for shared data items along with ...
Performance of One's Complement Caches
On-chip caches to reduce average memory access latency are commonplace in today's commercial microprocessors. These on-chip caches generally have low associativity and small cache sizes. Cache line conflicts are the main source of cache misses, which ...
Comments