ABSTRACT
Fast track is a software speculation system that enables unsafe optimization of sequential code. It speculatively runs optimized code to improve performance and then checks the correctness of the speculative code by running the original program on multiple processors. We present the interface design and system implementation for Fast Track. It lets a programmer or a profiling tool mark fast-track code regions and uses a run-time system to manage the parallel execution of the speculative process and its checking processes and ensures the correct display of program outputs. The core of the run-time system is a novel concurrent algorithm that balances exploitable parallelism and available processors when the fast track is too slow or too fast. The programming interface closely affects the run-time support. Our system permits both explicit and implicit end markers for speculatively optimized code regions as well as extensions that allow the use of multiple tracks and user defined correctness checking. We discuss the possible uses of speculative optimization and demonstrate the effectiveness of our prototype system by examples of unsafe semantic optimization and a general system for fast memory-safety checking, which is able to reduce the checking time by factors between 2 and 7 for large sequential code on a 8-CPU system.
- L. Rauchwerger and D. Padua, "The LRPD test: Speculative run-time parallelization of loops with privatization and reduction parallelization," in Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, La Jolla, CA, June 1995. Google ScholarDigital Library
- M. Gupta and R. Nim, "Techniques for run-time parallelization of loops," in Proceedings of SC'98, 1998. Google ScholarDigital Library
- M. H. Cintra and D. R. Llanos, "Design space exploration of a software speculative parallelization scheme," IEEE Transactions on Parallel and Distributed Systems, vol. 16, no. 6, pp. 562-576, 2005. Google ScholarDigital Library
- C. Ding, X. Shen, K. Kelsey, C. Tice, R. Huang, and C. Zhang, "Software behavior-oriented parallelization," in Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, San Diego, USA, 2007. Google ScholarDigital Library
- C. Tian, M. Feng, V. Nagarajan, and R. Gupta, "Copy or discard execution model for speculative parallelization on multicores," in Proceedings of the ACM/IEEE International Symposium on Microarchitecture, 2008. Google ScholarDigital Library
- C. B. Zilles and G. S. Sohi, "Master/slave speculative parallelization." in Proceedings of the ACM/IEEE International Symposium on Microarchitecture, 2002, pp. 85-96. Google ScholarDigital Library
- A. Zhai, C. B. Colohan, J. G. Steffan, and T. C. Mowry, "Compiler optimization of scalar value communication between speculative threads," in Proceedings of the International Conference on Architectual Support for Programming Languages and Operating Systems, 2002, pp. 171-183. Google ScholarDigital Library
- C. Quinones, C. Madriles, J. Sánchez, P. Marcuello, A. González, and D. M. Tullsen, "Mitosis compiler: an infrastructure for speculative threading based on precomputation slices," in Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. New York, NY, USA: ACM, 2005, pp. 269-279. Google ScholarDigital Library
- N. Neelakantam, R. Rajwar, S. Srinivas, U. Srinivasan, and C. B. Zilles, "Hardware atomicity for reliable software speculation," in Proceedings of the International Symposium on Computer Architecture, 2007, pp. 174-185. Google ScholarDigital Library
- R. Allen and K. Kennedy, Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann Publishers, October 2001. Google ScholarDigital Library
- C. Zhang, K. Kelsey, X. Shen, C. Ding, M. Hertz, and M. Ogihara, "Program-level adaptive memory management," in Proceedings of the International Symposium on Memory Management, Ottawa, Canada, June 2006. Google ScholarDigital Library
- C. Grzegorczyk, S. Soman, C. Krintz, and R. Wolski, "Isla vista heap sizing: Using feedback to avoid paging," in Proceedings of the International Symposium on Code Generation and Optimization, 2007, pp. 325-340. Google ScholarDigital Library
- D. Michie, "Memo functions and machine learning," Nature, vol. 218, pp. 19-22, 1968.Google Scholar
- Y. Ding and Z. Li, "A compiler scheme for reusing intermediate computation results." in Proceedings of the International Symposium on Code Generation and Optimization, 2004. Google ScholarDigital Library
- C.-K. Luk et al., "Pin: Building customized program analysis tools with dynamic instrumentation," in Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, Chicago, Illinois, June 2005. Google ScholarDigital Library
- T. Moseley, A. Shye, V. J. Reddi, D. Grunwald, and R. Peri, "Shadow profiling: Hiding instrumentation costs with parallelism," in Proceedings of the International Symposium on Code Generation and Optimization, 2007, pp. 198-208. Google ScholarDigital Library
- S. Wallace and K. Hazelwood, "Superpin: Parallelizing dynamic instrumentation for real-time performance," in Proceedings of the International Symposium on Code Generation and Optimization, 2007, pp. 209-220. Google ScholarDigital Library
- E. B. Nightingale, D. Peek, P. M. Chen, and J. Flinn, "Parallelizing security checks on commodity hardware," in Proceedings of the International Conference on Architectual Support for Programming Languages and Operating Systems, 2008, pp. 308-318. Google ScholarDigital Library
- H. Patil and C. Fischer, "Efcient run-time monitoring using shadow processing," 1995, presented at AADEBUG'95.Google Scholar
- J.-Y. Tsai, Z. Jiang, and P.-C. Yew, "Compiler techniques for the superthreaded architectures," International Journal of Parallel Programming, vol. 27, no. 1, pp. 1-19, 1999. Google ScholarDigital Library
- K. Sundaramoorthy, Z. Purser, and E. Rotenberg, "Slipstream processors: Improving both performance and fault tolerance." in Proceedings of the International Conference on Architectual Support for Programming Languages and Operating Systems, 2000, pp. 257-268. Google ScholarDigital Library
- S.-W. Liao, P. H. Wang, H. Wang, J. P. Shen, G. Hoflehner, and D. M. Lavery, "Post-pass binary adaptation for software-based speculative precomputation," in Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2002, pp. 117-128. Google ScholarDigital Library
- A. Garg and M. Huang, "A performance-correctness explicitly-decoupled architecture," in Proceedings of the ACM/IEEE International Symposium on Microarchitecture, 2008. Google ScholarDigital Library
- J. T. Oplinger and M. S. Lam, "Enhancing software reliability with speculative threads," in Proceedings of the International Conference on Architectual Support for Programming Languages and Operating Systems, 2002, pp. 184-196. Google ScholarDigital Library
- P. Zhou, F. Qin, W. Liu, Y. Zhou, and J. Torrellas, "iwatcher: Efficient architectural support for software debugging," in Proceedings of the International Symposium on Computer Architecture, 2004, pp. 224-237. Google ScholarDigital Library
- S. Lee and J. Tuck, "Parallelizing Mudflap using thread-level speculation on a CMP," 2008, presented at the Workshop on the Parallel Execution of Sequential Programs on Multicore Architecture, co-located with ISCA.Google Scholar
- J. Mellor-Crummey, "Compile-time support for efficient data race detection in shared memory parallel programs," Rice University, Tech. Rep. CRPC-TR92232, September 1992.Google Scholar
- D. Perkovic and P. J. Keleher, "A protocol-centric approach to on-the-fly race detection," IEEE Transactions on Parallel and Distributed Systems, vol. 11, no. 10, pp. 1058-1072, 2000. Google ScholarDigital Library
- R. Wahbe, S. Lucco, and S. L. Graham, "Practical data breakpoints: design and implementation," in Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, June 1993. Google ScholarDigital Library
- C. Ding and K. Kennedy, "Improving cache performance in dynamic applications through data and computation reorganization at run time," in Proceedings of the SIGPLAN '99 Conference on Programming Language Design and Implementation , Atlanta, GA, May 1999. Google ScholarDigital Library
- M. Arnold and B. G. Ryder, "A framework for reducing the cost of instrumented code," in Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, Snowbird, Utah, June 2001. Google ScholarDigital Library
- T. M. Chilimbi and M. Hirzel, "Dynamic hot data stream prefetching for general-purpose programs," in Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, Berlin, Germany, June 2002. Google ScholarDigital Library
- K. Li, "Shared virtual memory on loosely coupled multiprocessors," Ph.D. dissertation, Dept. of Computer Science, Yale University, New Haven, CT, Sep. 1986. Google ScholarDigital Library
- P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel, "TreadMarks: Distributed shared memory on standard workstations and operating systems," in Proceedings of the 1994 Winter USENIX Conference, Jan. 1994. Google ScholarDigital Library
Index Terms
- Fast Track: A Software System for Speculative Program Optimization
Recommendations
Fast Track: Supporting Unsafe Optimizations with Software Speculation
PACT '07: Proceedings of the 16th International Conference on Parallel Architecture and Compilation TechniquesThe shift in processor technology toward multi-core, multi-processors opens new opportunities for software speculation where program code is speculatively executed to improve speed at the cost of having handle errors. In this paper we describe a new use ...
Fast, frequency-based, integrated register allocation and instruction scheduling
Instruction scheduling and register allocation are two of the most important optimization phases in modern compilers as they have a significant impact on the quality of the generated code. Unfortunately, the objectives of these two optimizations are in ...
Fast branch misprediction recovery in out-of-order superscalar processors
ICS '05: Proceedings of the 19th annual international conference on SupercomputingCurrent trends in modern out-of-order processors involve implementing deeper pipelines and a large instruction window to achieve high performance. However, as pipeline depth increases, the branch misprediction penalty becomes a critical factor in ...
Comments