ABSTRACT
MongoDB is a distributed database that supports replication and horizontal partitioning (sharding). MongoDB replica sets consist of a primary that accepts all client writes and then propagates those writes to the secondaries. Each member of the replica set contains the same set of data. For horizontal partitioning, each shard (or partition) is a replica set. This paper discusses the design and rationale behind MongoDB's implementation of a cluster-wide logical clock and causal consistency. The design leveraged ideas from across the research community to ensure that the implementation adds minimal processing overhead, tolerates possible operator errors, and gives protection against non-trusted client attacks. While the goal of the team was not to discover or test new algorithms, the practical implementation necessitated a novel combination of ideas from the research community on causal consistency, security, and minimal performance overhead at scale. This paper describes a large scale, practical implementation of causal consistency using a hybrid logical clock, adding the signing of logical time ranges to the protocol, and introducing performance optimizations necessary for systems at scale. The implementation seeks to define an event as a state change and as such must make forward progress guarantees even during periods of no state changes for a partition of data.
- M. Ahamad, G. Neiger, J. E. Burns et al: Causal Memory: Definitions, Implementation and Programming. Distributed Computing (1995) 9: 37. Google ScholarDigital Library
- D. D. Akkoorath, A. Z. Tomsic, M. Bravo, Z. Li, T. Crain, A. Bieniusa, N. Preguica, and M. Shapiro. Cure: Strong semantics meets high availability and low latency. In 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS), volume 00, pages 405--414, June 2016.Google ScholarCross Ref
- Amazon. Keeping Time With Amazon Time Sync Service. https://aws.amazon.com/blogs/aws/keeping-time-with-amazon-time-sync- serviceGoogle Scholar
- Apache, SOLR 7.6.0 http://lucene.apache.org/solr/, 2019.Google Scholar
- P. Bailis, A. Ghodsi, J. M. Hellerstein, I. Stoica Bolt-on Causal Consistency. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 761--772Google Scholar
- P. Bailis and A. Ghodsi. Eventual Consistency today: Limitations, extensions, and beyond. ACM Queue, 11(3), 2013 Google ScholarDigital Library
- P. Bailis, K. Kingsbury. The Network is Reliable. ACM queue, 12(7), July 23, 2014. Google ScholarDigital Library
- P. Bailis at al. Highly Available Transactions: Virtues and Limitations, Proceedings of the VLDB Endowment Volume 7 (3): 181--192, November 2013. Google ScholarDigital Library
- P. Bailis, S. Venkataraman, M. J. Franklin, J. M. Hellerstein, and I. Stoica. Probabilistically Bounded Staleness for Practical Partial Quorums. PVLDB, 5(8):776--787, 2012. Google ScholarDigital Library
- P. Bailis, A. Fekete, A. Ghodsi, J. M. Hellerstein, and I. Stoica. The potential dangers of causal consistency and an explicit solution. In SOCC 2012. Google ScholarDigital Library
- Basho, RIAK KV, http://basho.com/products/riak-kv/, 2019.Google Scholar
- D. R. Cheriton, D. Skeen. Understanding the limitations of causal and totally ordered multicast. In Proceedings of the 14th Symposium on Operating System Principles (SOSP '93, Asheville, NC, 1993). 44- 57. Google ScholarDigital Library
- B. F. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H.- A. Jacobsen, N. Puz, D. Weaver, and R. Yerneni. Pnuts: Yahoo!'s hosted data serving platform. Proc. VLDB Endow., 1(2):1277--1288, Aug. 2008. Google ScholarDigital Library
- J. C. Corbett, J. Dean, M. Epstein et al. Spanner: Google's Globally-Distributed Database, In Proceedings of OSDI 2012. Google ScholarDigital Library
- G. DeCandia et al. Dynamo: Amazon's Highly Available Key-value Store. SOSP '07 Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles. pp. 205--222. Google ScholarDigital Library
- D. Didona, R. Guerraoui, J. Wang, and W. Zwaenepoel. Causal consistency and latency optimality: Friend or foe?. Technical Report 256091, EPFL, July 2018.Google Scholar
- J. Du, C. Iorgulescu, A. Roy, and W. Zwaenepoel. Gentlerain: Cheap and scalable causal consistency with physical clocks. In Proceedings of the ACM Symposium on Cloud Computing, SOCC '14, pages 4:1--4:13, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
- C. J. Fidge. Timestamps in Message-Passing Systems That Preserve the Partial Ordering. Australian Computer Science Communications, 10(1):56--66, February 1988.Google Scholar
- S. Gilbert, N. Lynch. Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services, ACM SIGACT News, 33(2):51--59, 2002. Google ScholarDigital Library
- M. Kleppmann, "A critique of the CAP theorem," 2015, http:// arxiv.org/abs/1509.05393.Google Scholar
- S. Kulkarni, M. Demirbas, D. Madappa, B. Avva, and M. Leone. Logical physical clocks. In Principles of Distributed Systems, volume 8878:17--32, 2014.Google ScholarCross Ref
- A. Lakshman P. Malik. Cassandra - A Decentralized Structured Storage System ACM SIGOPS Operating Systems Review, 44(2): 35--40, April 2010. Google ScholarDigital Library
- L. Lamport. Time, clocks, and the ordering of events in a distributed system. Communications of the ACM, 21(7):558--565, July 1978. Google ScholarDigital Library
- S. Liu, P. Viotti, C. Cachin et al. XFT: Practical Fault Tolerance Beyond Crashes. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI '16), Savannah, GA, November 2016.Google Scholar
- Wyatt Lloyd, Michael J Freedman, Michael Kaminsky, and David G Andersen. Don't settle for eventual: Scalable causal consistency for wide-area storage with COPS. In 23rd ACM Symposium on Operating Systems Principles (SOSP), pages 401--416, October 2011. Google ScholarDigital Library
- P. Mahajan, L. Alvisi, and M. Dahlin. Consistency, availability, convergence. Technical Report TR-11--22, Computer Science Department, UT Austin, May 2011.Google Scholar
- S. A. Mehdi, C. Littley, N. Crooks, L. Alvisi, N. Bronson, and W. Lloyd. I can't believe it's not causal! Scalable causal consistency with no slowdown cascades. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI '17), pages 453--468, Boston, MA, 2017. USENIX. Google ScholarDigital Library
- D. Mills. A brief history of ntp time: Memoirs of an internet timekeeper. ACM SIGCOMM Computer Communication Review, 33(2):9--21, 2003. Google ScholarDigital Library
- MongoDB Atlas, https://www.mongodb.com/cloud/atlas, 2019.Google Scholar
- MongoDB Evergreen: https://evergreen.mongodb.com, 2019.Google Scholar
- MongoDB Manual 3.6. ReadConcern https://docs.mongodb.com/v3.6/reference/read-concern/.Google Scholar
- MongoDB Manual 3.6. WriteConcern. https://docs.mongodb.com/v3.6/reference/write-concern/.Google Scholar
- MongoDB Manual 3.6 Change Streams. https://docs.mongodb.com/v3.6/changeStreams/.Google Scholar
- MongoDB Manual 3.6 Zone Sharding. https://docs.mongodb.com/v3.6/tutorial/manage-shard-zone/.Google Scholar
- MongoDB source code: https://github.com/mongodb/mongo/blob/r3.7.2/src/mongo/db/logical_clock.cpp #L114-L158.Google Scholar
- MongoDB source code: https://github.com/mongodb/mongo/blob/r3.7.2/src/mongo/db/logical_clock.cpp #L99-L112.Google Scholar
- K. Patella, MongoDB 3.6.4 http://jepsen.io/analyses/mongodb-3--6--4.Google Scholar
- TPC-C benchmark, http://www.tpc.org/tpcc/, 2019.Google Scholar
Index Terms
- Implementation of Cluster-wide Logical Clock and Causal Consistency in MongoDB
Recommendations
Causal consistency: beyond memory
PPoPP '16In distributed systems where strong consistency is costly when not impossible, causal consistency provides a valuable abstraction to represent program executions as partial orders. In addition to the sequential program order of each computing entity, ...
Bolt-on causal consistency
SIGMOD '13: Proceedings of the 2013 ACM SIGMOD International Conference on Management of DataWe consider the problem of separating consistency-related safety properties from availability and durability in distributed data stores via the application of a "bolt-on" shim layer that upgrades the safety of an underlying general-purpose data store. ...
Causal consistency: beyond memory
PPoPP '16: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingIn distributed systems where strong consistency is costly when not impossible, causal consistency provides a valuable abstraction to represent program executions as partial orders. In addition to the sequential program order of each computing entity, ...
Comments