skip to main content
10.1145/1950365.1950376acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

RCDC: a relaxed consistency deterministic computer

Published:05 March 2011Publication History

ABSTRACT

Providing deterministic execution significantly simplifies the debugging, testing, replication, and deployment of multithreaded programs. Recent work has developed deterministic multiprocessor architectures as well as compiler and runtime systems that enforce determinism in current hardware. Such work has incidentally imposed strong memory-ordering properties. Historically, memory ordering has been relaxed in favor of higher performance in shared memory multiprocessors and, interestingly, determinism exacerbates the cost of strong memory ordering. Consequently, we argue that relaxed memory ordering is vital to achieving faster deterministic execution.

This paper introduces RCDC, a deterministic multiprocessor architecture that takes advantage of relaxed memory orderings to provide high-performance deterministic execution with low hardware complexity. RCDC has two key innovations: a hybrid HW/SW approach to enforcing determinism; and a new deterministic execution strategy that leverages data-race-free-based memory models (e.g., the models for Java and C++) to improve performance and scalability without sacrificing determinism, even in the presence of races. In our hybrid HW/SW approach, the only hardware mechanisms required are software-controlled store buffering and support for precise instruction counting; we do not require speculation. A runtime system uses these mechanisms to enforce determinism for arbitrary programs.

We evaluate RCDC using PARSEC benchmarks and show that relaxing memory ordering leads to performance and scalability close to nondeterministic execution without requiring any form of speculation. We also compare our new execution strategy to one based on TSO (total-store-ordering) and show that some applications benefit significantly from the extra relaxation. We also evaluate a software-only implementation of our new deterministic execution strategy.

References

  1. S. Adve and M. Hill. Weak Ordering -- A New Definition. In ISCA, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. G. Altekar and I. Stoica. ODR: Output-Deterministic Replay for Multicore Debugging. In SOSP, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Aviram, S.-C. Weng, S. Hu, and B. Ford. Efficient System-Enforced Deterministic Parallelism. In OSDI, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. T. Bergan, O. Anderson, J. Devietti, L. Ceze, and D. Grossman. CoreDet: a Compiler and Runtime System for Deterministic Multithreaded Execution. In ASPLOS, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. T. Bergan, N. Hunt, L. Ceze, and S. Gribble. Deterministic Process Groups in dOS. In OSDI, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. E. Berger, T. Yang, T. Liu, and G. Novark. Grace: Safe and Efficient Concurrent Programming. In OOPSLA, 2009.Google ScholarGoogle Scholar
  7. C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC Benchmark Suite: Characterization and Architectural Implications. In PACT, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. Blelloch. NESL: A Nested Data-Parallel Language (Version 3.1). Technical report, Carnegie Mellon University, Pittsburgh, PA, 2007.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. Bocchino, V. Adve, D. Dig, S. Adve, S. Heumann, R. Komuravelli, J. Overbey, P. Simmons, H. Sung, and M. Vakilian. A Type and Effect System for Deterministic Parallel Java. In OOPSLA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. H.-J. Boehm and S. Adve. Foundations of the C++ Concurrency Memory Model. In PLDI, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. L. Ceze, J. Tuck, P. Montesinos, and J. Torrellas. BulkSC: Bulk Enforcement of Sequential Consistency. In ISCA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. M. T. Chakravarty, R. Leshchinskiy, S. P. Jones, G. Keller, and S. Marlow. Data Parallel Haskell: A Status Report. In Workshop on Declarative Aspects of Multicore Programming (DAMP), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. H. Cui, J. Wu, C. che Tsai, and J. Yang. Stable Deterministic Multithreading through Schedule Memoization. In OSDI, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Devietti, B. Lucia, L. Ceze, and M. Oskin. DMP: Deterministic Shared Memory Multiprocessing. In ASPLOS, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Dubois, J. Wang, L. Barroso, K. Lee, and Y. Chen. Delayed Consistency and Its Effects on the Miss Rate of Parallel Programs. In Supercomputing, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. A. Edwards and O. Tardieu. SHIM: A Deterministic Model for Heterogeneous Embedded Systems. In EMSOFT, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy. Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors. In ISCA, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. Gniady, B. Falsafi, and T. N. Vijaykumar. Is SC ILP = RC? In ISCA, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. Hower, P. Dudnik, D. Wood, and M. Hill. Calvin: Deterministic or Not? Free Will to Choose. In HPCA, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. Hower and M. Hill. ReRun: Exploiting Episodes for Lightweight Memory Race Recording. In ISCA, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C. K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. PIN: Building Customized Program Analysis Tools with Dynamic Instrumentation. In PLDI, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Manson, W. Pugh, and S. Adve. The Java Memory Model. In POPL, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. P. Montesinos, L. Ceze, and J. Torrellas. DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Efficiently. In ISCA, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. Narayanasamy, C. Pereira, and B. Calder. Recording Shared Memory Dependencies Using Strata. In ASPLOS, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. R. H. B. Netzer. Optimal Tracing and Replay for Debugging Shared-Memory Parallel Programs. In PADD, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. Olszewski, J. Ansel, and S. Amarasinghe. Kendo: Efficient Deterministic Multithreading in Software. In ASPLOS, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. Ousterhout. Scheduling Techniques for Concurrent Systems. In ICDCS, 1982.Google ScholarGoogle Scholar
  28. S. Parka, W. Xiong, Z. Yin, R. Kaushik, K. Lee, S. Lu, and Y. Zhou. Do You Have to Reproduce the Bug at the First Replay Attempt? -- PRES: Probabilistic Replay with Execution Sketching on Multiprocessors. In SOSP, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. Rinard and M. Lam. The Design, Implementation, and Evaluation of Jade. ACM TOPLAS, 20(3), 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. K. Russell and D. Detlefs. Eliminating synchronization-related atomic operations with biased locking and bulk rebiasing. In OOPSLA, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. W. Thies, M. Karczmarek, and S. Amarasinghe. StreamIt: A Language for Streaming Applications. In CC, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. E. Vallejo, M. Galluzzi, A. Cristal, F. Vallejo, R. Beivide, P. Stenstrom, J. E. Smith, and M. Valero. Implementing Kilo-Instruction Multiprocessors. In ICPS, July 2005.Google ScholarGoogle ScholarCross RefCross Ref
  33. C. von Praun, H. Cain, J. Choi, and K. Ryu. Conditional Memory Ordering. In ISCA, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. T. F. Wenisch, A. Ailamaki, B. Falsafi, and A. Moshovos. Mechanisms for Store-wait-free Multiprocessors. In ISCA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. M. Xu, R. Bodik, and M. Hill. A "Flight Data Recorder" for Enabling Full-System Multiprocessor Deterministic Replay. In ISCA, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. M. Xu, M. Hill, and R. Bodik. A Regulated Transitive Reduction for Longer Memory Race Recording. In ASPLOS, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. RCDC: a relaxed consistency deterministic computer

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ASPLOS XVI: Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
          March 2011
          432 pages
          ISBN:9781450302661
          DOI:10.1145/1950365
          • cover image ACM SIGPLAN Notices
            ACM SIGPLAN Notices  Volume 46, Issue 3
            ASPLOS '11
            March 2011
            407 pages
            ISSN:0362-1340
            EISSN:1558-1160
            DOI:10.1145/1961296
            Issue’s Table of Contents
          • cover image ACM SIGARCH Computer Architecture News
            ACM SIGARCH Computer Architecture News  Volume 39, Issue 1
            ASPLOS '11
            March 2011
            407 pages
            ISSN:0163-5964
            DOI:10.1145/1961295
            Issue’s Table of Contents

          Copyright © 2011 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 5 March 2011

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate535of2,713submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader