ABSTRACT
Providing deterministic execution significantly simplifies the debugging, testing, replication, and deployment of multithreaded programs. Recent work has developed deterministic multiprocessor architectures as well as compiler and runtime systems that enforce determinism in current hardware. Such work has incidentally imposed strong memory-ordering properties. Historically, memory ordering has been relaxed in favor of higher performance in shared memory multiprocessors and, interestingly, determinism exacerbates the cost of strong memory ordering. Consequently, we argue that relaxed memory ordering is vital to achieving faster deterministic execution.
This paper introduces RCDC, a deterministic multiprocessor architecture that takes advantage of relaxed memory orderings to provide high-performance deterministic execution with low hardware complexity. RCDC has two key innovations: a hybrid HW/SW approach to enforcing determinism; and a new deterministic execution strategy that leverages data-race-free-based memory models (e.g., the models for Java and C++) to improve performance and scalability without sacrificing determinism, even in the presence of races. In our hybrid HW/SW approach, the only hardware mechanisms required are software-controlled store buffering and support for precise instruction counting; we do not require speculation. A runtime system uses these mechanisms to enforce determinism for arbitrary programs.
We evaluate RCDC using PARSEC benchmarks and show that relaxing memory ordering leads to performance and scalability close to nondeterministic execution without requiring any form of speculation. We also compare our new execution strategy to one based on TSO (total-store-ordering) and show that some applications benefit significantly from the extra relaxation. We also evaluate a software-only implementation of our new deterministic execution strategy.
- S. Adve and M. Hill. Weak Ordering -- A New Definition. In ISCA, 1990. Google ScholarDigital Library
- G. Altekar and I. Stoica. ODR: Output-Deterministic Replay for Multicore Debugging. In SOSP, 2009. Google ScholarDigital Library
- A. Aviram, S.-C. Weng, S. Hu, and B. Ford. Efficient System-Enforced Deterministic Parallelism. In OSDI, 2010. Google ScholarDigital Library
- T. Bergan, O. Anderson, J. Devietti, L. Ceze, and D. Grossman. CoreDet: a Compiler and Runtime System for Deterministic Multithreaded Execution. In ASPLOS, 2010. Google ScholarDigital Library
- T. Bergan, N. Hunt, L. Ceze, and S. Gribble. Deterministic Process Groups in dOS. In OSDI, 2010. Google ScholarDigital Library
- E. Berger, T. Yang, T. Liu, and G. Novark. Grace: Safe and Efficient Concurrent Programming. In OOPSLA, 2009.Google Scholar
- C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC Benchmark Suite: Characterization and Architectural Implications. In PACT, 2008. Google ScholarDigital Library
- G. Blelloch. NESL: A Nested Data-Parallel Language (Version 3.1). Technical report, Carnegie Mellon University, Pittsburgh, PA, 2007.Google ScholarDigital Library
- R. Bocchino, V. Adve, D. Dig, S. Adve, S. Heumann, R. Komuravelli, J. Overbey, P. Simmons, H. Sung, and M. Vakilian. A Type and Effect System for Deterministic Parallel Java. In OOPSLA, 2009. Google ScholarDigital Library
- H.-J. Boehm and S. Adve. Foundations of the C++ Concurrency Memory Model. In PLDI, 2008. Google ScholarDigital Library
- L. Ceze, J. Tuck, P. Montesinos, and J. Torrellas. BulkSC: Bulk Enforcement of Sequential Consistency. In ISCA, 2007. Google ScholarDigital Library
- M. M. T. Chakravarty, R. Leshchinskiy, S. P. Jones, G. Keller, and S. Marlow. Data Parallel Haskell: A Status Report. In Workshop on Declarative Aspects of Multicore Programming (DAMP), 2007. Google ScholarDigital Library
- H. Cui, J. Wu, C. che Tsai, and J. Yang. Stable Deterministic Multithreading through Schedule Memoization. In OSDI, 2010. Google ScholarDigital Library
- J. Devietti, B. Lucia, L. Ceze, and M. Oskin. DMP: Deterministic Shared Memory Multiprocessing. In ASPLOS, 2009. Google ScholarDigital Library
- M. Dubois, J. Wang, L. Barroso, K. Lee, and Y. Chen. Delayed Consistency and Its Effects on the Miss Rate of Parallel Programs. In Supercomputing, 1991. Google ScholarDigital Library
- S. A. Edwards and O. Tardieu. SHIM: A Deterministic Model for Heterogeneous Embedded Systems. In EMSOFT, 2005. Google ScholarDigital Library
- K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy. Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors. In ISCA, 1990. Google ScholarDigital Library
- C. Gniady, B. Falsafi, and T. N. Vijaykumar. Is SC ILP = RC? In ISCA, 1999. Google ScholarDigital Library
- D. Hower, P. Dudnik, D. Wood, and M. Hill. Calvin: Deterministic or Not? Free Will to Choose. In HPCA, 2011. Google ScholarDigital Library
- D. Hower and M. Hill. ReRun: Exploiting Episodes for Lightweight Memory Race Recording. In ISCA, 2008. Google ScholarDigital Library
- C. K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. PIN: Building Customized Program Analysis Tools with Dynamic Instrumentation. In PLDI, 2005. Google ScholarDigital Library
- J. Manson, W. Pugh, and S. Adve. The Java Memory Model. In POPL, 2005. Google ScholarDigital Library
- P. Montesinos, L. Ceze, and J. Torrellas. DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Efficiently. In ISCA, 2008. Google ScholarDigital Library
- S. Narayanasamy, C. Pereira, and B. Calder. Recording Shared Memory Dependencies Using Strata. In ASPLOS, 2006. Google ScholarDigital Library
- R. H. B. Netzer. Optimal Tracing and Replay for Debugging Shared-Memory Parallel Programs. In PADD, 1993. Google ScholarDigital Library
- M. Olszewski, J. Ansel, and S. Amarasinghe. Kendo: Efficient Deterministic Multithreading in Software. In ASPLOS, 2009. Google ScholarDigital Library
- J. Ousterhout. Scheduling Techniques for Concurrent Systems. In ICDCS, 1982.Google Scholar
- S. Parka, W. Xiong, Z. Yin, R. Kaushik, K. Lee, S. Lu, and Y. Zhou. Do You Have to Reproduce the Bug at the First Replay Attempt? -- PRES: Probabilistic Replay with Execution Sketching on Multiprocessors. In SOSP, 2009. Google ScholarDigital Library
- M. Rinard and M. Lam. The Design, Implementation, and Evaluation of Jade. ACM TOPLAS, 20(3), 1988. Google ScholarDigital Library
- K. Russell and D. Detlefs. Eliminating synchronization-related atomic operations with biased locking and bulk rebiasing. In OOPSLA, 2006. Google ScholarDigital Library
- W. Thies, M. Karczmarek, and S. Amarasinghe. StreamIt: A Language for Streaming Applications. In CC, 2002. Google ScholarDigital Library
- E. Vallejo, M. Galluzzi, A. Cristal, F. Vallejo, R. Beivide, P. Stenstrom, J. E. Smith, and M. Valero. Implementing Kilo-Instruction Multiprocessors. In ICPS, July 2005.Google ScholarCross Ref
- C. von Praun, H. Cain, J. Choi, and K. Ryu. Conditional Memory Ordering. In ISCA, 2006. Google ScholarDigital Library
- T. F. Wenisch, A. Ailamaki, B. Falsafi, and A. Moshovos. Mechanisms for Store-wait-free Multiprocessors. In ISCA, 2007. Google ScholarDigital Library
- M. Xu, R. Bodik, and M. Hill. A "Flight Data Recorder" for Enabling Full-System Multiprocessor Deterministic Replay. In ISCA, 2003. Google ScholarDigital Library
- M. Xu, M. Hill, and R. Bodik. A Regulated Transitive Reduction for Longer Memory Race Recording. In ASPLOS, 2006. Google ScholarDigital Library
Index Terms
- RCDC: a relaxed consistency deterministic computer
Recommendations
RCDC: a relaxed consistency deterministic computer
ASPLOS '11Providing deterministic execution significantly simplifies the debugging, testing, replication, and deployment of multithreaded programs. Recent work has developed deterministic multiprocessor architectures as well as compiler and runtime systems that ...
RCDC: a relaxed consistency deterministic computer
ASPLOS '11Providing deterministic execution significantly simplifies the debugging, testing, replication, and deployment of multithreaded programs. Recent work has developed deterministic multiprocessor architectures as well as compiler and runtime systems that ...
On Thin Air Reads Towards an Event Structures Model of Relaxed Memory
LICS '16: Proceedings of the 31st Annual ACM/IEEE Symposium on Logic in Computer ScienceThis is the first paper to propose a pure event structures model of relaxed memory. We propose confusion-free event structures over an alphabet with a justification relation as a model. Executions are modeled by justified configurations, where every ...
Comments