skip to main content
research-article
Free Access

Two hardware-based approaches for deterministic multiprocessor replay

Published:01 June 2009Publication History
Skip Abstract Section

Abstract

Modern computer systems are inherently nondeterministic due to a variety of events that occur during an execution, including I/O, interrupts, and DMA fills. The lack of repeatability that arises from this nondeterminism can make it difficult to develop and maintain correct software. Furthermore, it is likely that the impact of nondeterminism will only increase in the coming years, as commodity systems are now shared-memory multiprocessors. Such systems are not only impacted by the sources of nondeterminism in uniprocessors, but also by the outcome of memory races among concurrent threads.

In an effort to help ease the pain of developing software in a nondeterministic environment, researchers have proposed adding deterministic replay capabilities to computer systems. A system with a deterministic replay capability can record sufficient information during an execution to enable a replayer to (later) create an equivalent execution despite the inherent sources of nondeterminism that exist. With the ability to replay an execution verbatim, many new applications may be possible:

Debugging: Deterministic replay could be used to provide the illusion of a time-travel debugger that has the ability to selectively execute both forward and backward in time.

Security: Deterministic replay could also be used to enhance the security of software by providing the means for an in-depth analysis of an attack, hopefully leading to rapid patch deployment and a reduction in the economic impact of new threats.

Fault Tolerance: With the ability to replay an execution, it may also be possible to develop hot-standby systems for critical service providers using commodity hardware. A virtual machine (VM) could, for example, be fed, in real time, the replay log of a primary server running on a physically separate machine. The standby VM could use the replay log to mimic the primary's execution, so that in the event that the primary fails, the backup can take over operation with almost zero downtime.

References

  1. Alameldeen, A.R., Mauer, C.J., Xu, M., Harper, P.J., Martin, M.M.K., Sorin, D.J., Hill, M.D., Wood, D.A. Evaluating non-deterrninistic multithreaded commercial workloads. In Proceedings of the 5th Workshop on Computer Architecture Evaluation Using Commercial Workloads (February 2002), 30--38Google ScholarGoogle Scholar
  2. Bacon, D.F., Goldstein, S.C. Hardware-assisted replay of multiprocessor programs. Proceedings of the ACM/ONR Workshop on Parallel and Distributed Debugging, published in ACM SIGPLAN Notices (1991), 194--206. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Ceze, L., Tuck, J.M., Montesinos, P., Torrellas, J. BulkSC: Bulk Enforcement of Sequential Consistency. In Proceedings of the 34th International Symposium on Computer Architecture (San Diego, CA, USA, June 2007). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Dunlap, G.W., Lucchetti, D., Chen, P.M., Fetterman, M. Execution replay on multiprocessor virtual machines. In International Conference on Virtual Execution Environments (VEE) (2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Hammond, L., Wong, V., Chen, M., Carlstrom, B.D., Davis, J.D., Hertzberg, B., Prabhu, M.K., Wijaya, H., Kozyrakis, C., Olukotun, K. Transactional memory coherence and consistency. In Proceedings of the 34th International Symposium on Computer Architecture (June 2004). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Hower, D.R., Hill, M.D. Rerun: Exploiting episodes for lightweight race recording. In Proceedings of the 35th Annual International. Symposium on Computer Architecture (June 2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Lamport, L. Time, clocks and the ordering of events in a distributed system. Commun. ACM 21, 7 (July 1978), 558--565. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Leblanc, T.J., Mellor-Crummey, J.M. Debugging parallel programs with instant replay. IEEE Trans. Comp. C-36, 4 (April 1987). 471--482. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Lucia, B., Devietti, J., Strauss, K., Ceze, L. Atom-aid: Detecting and surviving atomicity violations. In Proceedings of the 35th International Symposium on Computer Architecture (June 2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Martin, M.M.K., Sorin, D.J., Beckmann, B.M., Marty, M.R., Xu, M., Alameldeen, A.R., Moore, K.E., Hill, M.D., Wood, D.A. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset. Comp. Arch. News (September 2005), 92--99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Montesinos, P., Ceze, L., Torrellas, J. DeLorean: Recording and deterministically replaying shared-memory multiprocessor execution efficiently. In Proceedings of the 35th International Symposium on Computer Architecture (June 2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Narayanasamy, S., Pereira, C., Calder, B. Recording shared memory dependencies using strata. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (New York, NY, USA, October 2006), 229--240. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Netzer, R.H.B. Optimal tracing and replay for debugging shared-memory parallel programs. In Workshop on Parallel and Distributed Debugging (San Diego, California, May 1993), 1--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Renau, J., Fraguela, B., Tuck, J., Liu, W., Prvulovic, M., Ceze, L., Sarangi, S., Sack, P., Strauss, K., Montesinos, P. SESC Simulator (January 2005), http://sesc.sourceforge.net.Google ScholarGoogle Scholar
  15. Vallejo, E., Galluzzi, M., Cristal, A., Vallejo, F., Beivide, R., Stenstrom, P., Smith, J.E., Valero, M. Implementing kilo-instruction multiprocessors. In Proceedings of the 2005 International Conference on Pervasive Systems (July 2005).Google ScholarGoogle ScholarCross RefCross Ref
  16. Xu, M., Bodik, R., Hill, M.D. A "flight data recorder" for enabling full-system multiprocessor deterministic replay. In Proceedings of the 30th Annual International Symposium on Computer Architecture (June 2003). 122--133. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Xu, M., Bodik, R., Hill, M.D. A regulated transitive reduction (RTR) for longer memory race recording. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (October 2006), 49--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Xu, M., Malyugin, V., Sheldon, J., Venkitachalam, G., Weissman, B. Retrace: Collecting execution trace with virtual machine deterministic replay. In Proceedings of the 3rd Annual Workshop on Modeling, Benchmarking and Simulation (June 2007).Google ScholarGoogle Scholar

Index Terms

  1. Two hardware-based approaches for deterministic multiprocessor replay

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image Communications of the ACM
            Communications of the ACM  Volume 52, Issue 6
            One Laptop Per Child: Vision vs. Reality
            June 2009
            128 pages
            ISSN:0001-0782
            EISSN:1557-7317
            DOI:10.1145/1516046
            Issue’s Table of Contents

            Copyright © 2009 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 1 June 2009

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Popular
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format