skip to main content
10.1109/CGO.2009.18acmconferencesArticle/Chapter ViewAbstractPublication PagescgoConference Proceedingsconference-collections
Article

Fast Track: A Software System for Speculative Program Optimization

Published:22 March 2009Publication History

ABSTRACT

Fast track is a software speculation system that enables unsafe optimization of sequential code. It speculatively runs optimized code to improve performance and then checks the correctness of the speculative code by running the original program on multiple processors. We present the interface design and system implementation for Fast Track. It lets a programmer or a profiling tool mark fast-track code regions and uses a run-time system to manage the parallel execution of the speculative process and its checking processes and ensures the correct display of program outputs. The core of the run-time system is a novel concurrent algorithm that balances exploitable parallelism and available processors when the fast track is too slow or too fast. The programming interface closely affects the run-time support. Our system permits both explicit and implicit end markers for speculatively optimized code regions as well as extensions that allow the use of multiple tracks and user defined correctness checking. We discuss the possible uses of speculative optimization and demonstrate the effectiveness of our prototype system by examples of unsafe semantic optimization and a general system for fast memory-safety checking, which is able to reduce the checking time by factors between 2 and 7 for large sequential code on a 8-CPU system.

References

  1. L. Rauchwerger and D. Padua, "The LRPD test: Speculative run-time parallelization of loops with privatization and reduction parallelization," in Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, La Jolla, CA, June 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Gupta and R. Nim, "Techniques for run-time parallelization of loops," in Proceedings of SC'98, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. H. Cintra and D. R. Llanos, "Design space exploration of a software speculative parallelization scheme," IEEE Transactions on Parallel and Distributed Systems, vol. 16, no. 6, pp. 562-576, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. Ding, X. Shen, K. Kelsey, C. Tice, R. Huang, and C. Zhang, "Software behavior-oriented parallelization," in Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, San Diego, USA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Tian, M. Feng, V. Nagarajan, and R. Gupta, "Copy or discard execution model for speculative parallelization on multicores," in Proceedings of the ACM/IEEE International Symposium on Microarchitecture, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. B. Zilles and G. S. Sohi, "Master/slave speculative parallelization." in Proceedings of the ACM/IEEE International Symposium on Microarchitecture, 2002, pp. 85-96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Zhai, C. B. Colohan, J. G. Steffan, and T. C. Mowry, "Compiler optimization of scalar value communication between speculative threads," in Proceedings of the International Conference on Architectual Support for Programming Languages and Operating Systems, 2002, pp. 171-183. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. Quinones, C. Madriles, J. Sánchez, P. Marcuello, A. González, and D. M. Tullsen, "Mitosis compiler: an infrastructure for speculative threading based on precomputation slices," in Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. New York, NY, USA: ACM, 2005, pp. 269-279. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. N. Neelakantam, R. Rajwar, S. Srinivas, U. Srinivasan, and C. B. Zilles, "Hardware atomicity for reliable software speculation," in Proceedings of the International Symposium on Computer Architecture, 2007, pp. 174-185. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Allen and K. Kennedy, Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann Publishers, October 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. Zhang, K. Kelsey, X. Shen, C. Ding, M. Hertz, and M. Ogihara, "Program-level adaptive memory management," in Proceedings of the International Symposium on Memory Management, Ottawa, Canada, June 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Grzegorczyk, S. Soman, C. Krintz, and R. Wolski, "Isla vista heap sizing: Using feedback to avoid paging," in Proceedings of the International Symposium on Code Generation and Optimization, 2007, pp. 325-340. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Michie, "Memo functions and machine learning," Nature, vol. 218, pp. 19-22, 1968.Google ScholarGoogle Scholar
  14. Y. Ding and Z. Li, "A compiler scheme for reusing intermediate computation results." in Proceedings of the International Symposium on Code Generation and Optimization, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C.-K. Luk et al., "Pin: Building customized program analysis tools with dynamic instrumentation," in Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, Chicago, Illinois, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. T. Moseley, A. Shye, V. J. Reddi, D. Grunwald, and R. Peri, "Shadow profiling: Hiding instrumentation costs with parallelism," in Proceedings of the International Symposium on Code Generation and Optimization, 2007, pp. 198-208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Wallace and K. Hazelwood, "Superpin: Parallelizing dynamic instrumentation for real-time performance," in Proceedings of the International Symposium on Code Generation and Optimization, 2007, pp. 209-220. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. E. B. Nightingale, D. Peek, P. M. Chen, and J. Flinn, "Parallelizing security checks on commodity hardware," in Proceedings of the International Conference on Architectual Support for Programming Languages and Operating Systems, 2008, pp. 308-318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. H. Patil and C. Fischer, "Efcient run-time monitoring using shadow processing," 1995, presented at AADEBUG'95.Google ScholarGoogle Scholar
  20. J.-Y. Tsai, Z. Jiang, and P.-C. Yew, "Compiler techniques for the superthreaded architectures," International Journal of Parallel Programming, vol. 27, no. 1, pp. 1-19, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. K. Sundaramoorthy, Z. Purser, and E. Rotenberg, "Slipstream processors: Improving both performance and fault tolerance." in Proceedings of the International Conference on Architectual Support for Programming Languages and Operating Systems, 2000, pp. 257-268. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S.-W. Liao, P. H. Wang, H. Wang, J. P. Shen, G. Hoflehner, and D. M. Lavery, "Post-pass binary adaptation for software-based speculative precomputation," in Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2002, pp. 117-128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Garg and M. Huang, "A performance-correctness explicitly-decoupled architecture," in Proceedings of the ACM/IEEE International Symposium on Microarchitecture, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. T. Oplinger and M. S. Lam, "Enhancing software reliability with speculative threads," in Proceedings of the International Conference on Architectual Support for Programming Languages and Operating Systems, 2002, pp. 184-196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. P. Zhou, F. Qin, W. Liu, Y. Zhou, and J. Torrellas, "iwatcher: Efficient architectural support for software debugging," in Proceedings of the International Symposium on Computer Architecture, 2004, pp. 224-237. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Lee and J. Tuck, "Parallelizing Mudflap using thread-level speculation on a CMP," 2008, presented at the Workshop on the Parallel Execution of Sequential Programs on Multicore Architecture, co-located with ISCA.Google ScholarGoogle Scholar
  27. J. Mellor-Crummey, "Compile-time support for efficient data race detection in shared memory parallel programs," Rice University, Tech. Rep. CRPC-TR92232, September 1992.Google ScholarGoogle Scholar
  28. D. Perkovic and P. J. Keleher, "A protocol-centric approach to on-the-fly race detection," IEEE Transactions on Parallel and Distributed Systems, vol. 11, no. 10, pp. 1058-1072, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. R. Wahbe, S. Lucco, and S. L. Graham, "Practical data breakpoints: design and implementation," in Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, June 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. C. Ding and K. Kennedy, "Improving cache performance in dynamic applications through data and computation reorganization at run time," in Proceedings of the SIGPLAN '99 Conference on Programming Language Design and Implementation , Atlanta, GA, May 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. M. Arnold and B. G. Ryder, "A framework for reducing the cost of instrumented code," in Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, Snowbird, Utah, June 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. T. M. Chilimbi and M. Hirzel, "Dynamic hot data stream prefetching for general-purpose programs," in Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, Berlin, Germany, June 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. K. Li, "Shared virtual memory on loosely coupled multiprocessors," Ph.D. dissertation, Dept. of Computer Science, Yale University, New Haven, CT, Sep. 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel, "TreadMarks: Distributed shared memory on standard workstations and operating systems," in Proceedings of the 1994 Winter USENIX Conference, Jan. 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Fast Track: A Software System for Speculative Program Optimization

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CGO '09: Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
        March 2009
        299 pages
        ISBN:9780769535760

        Publisher

        IEEE Computer Society

        United States

        Publication History

        • Published: 22 March 2009

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        CGO '09 Paper Acceptance Rate26of70submissions,37%Overall Acceptance Rate312of1,061submissions,29%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader