skip to main content
10.1145/3299869.3314049acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Open Access

Implementation of Cluster-wide Logical Clock and Causal Consistency in MongoDB

Published:25 June 2019Publication History

ABSTRACT

MongoDB is a distributed database that supports replication and horizontal partitioning (sharding). MongoDB replica sets consist of a primary that accepts all client writes and then propagates those writes to the secondaries. Each member of the replica set contains the same set of data. For horizontal partitioning, each shard (or partition) is a replica set. This paper discusses the design and rationale behind MongoDB's implementation of a cluster-wide logical clock and causal consistency. The design leveraged ideas from across the research community to ensure that the implementation adds minimal processing overhead, tolerates possible operator errors, and gives protection against non-trusted client attacks. While the goal of the team was not to discover or test new algorithms, the practical implementation necessitated a novel combination of ideas from the research community on causal consistency, security, and minimal performance overhead at scale. This paper describes a large scale, practical implementation of causal consistency using a hybrid logical clock, adding the signing of logical time ranges to the protocol, and introducing performance optimizations necessary for systems at scale. The implementation seeks to define an event as a state change and as such must make forward progress guarantees even during periods of no state changes for a partition of data.

References

  1. M. Ahamad, G. Neiger, J. E. Burns et al: Causal Memory: Definitions, Implementation and Programming. Distributed Computing (1995) 9: 37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. D. Akkoorath, A. Z. Tomsic, M. Bravo, Z. Li, T. Crain, A. Bieniusa, N. Preguica, and M. Shapiro. Cure: Strong semantics meets high availability and low latency. In 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS), volume 00, pages 405--414, June 2016.Google ScholarGoogle ScholarCross RefCross Ref
  3. Amazon. Keeping Time With Amazon Time Sync Service. https://aws.amazon.com/blogs/aws/keeping-time-with-amazon-time-sync- serviceGoogle ScholarGoogle Scholar
  4. Apache, SOLR 7.6.0 http://lucene.apache.org/solr/, 2019.Google ScholarGoogle Scholar
  5. P. Bailis, A. Ghodsi, J. M. Hellerstein, I. Stoica Bolt-on Causal Consistency. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 761--772Google ScholarGoogle Scholar
  6. P. Bailis and A. Ghodsi. Eventual Consistency today: Limitations, extensions, and beyond. ACM Queue, 11(3), 2013 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. Bailis, K. Kingsbury. The Network is Reliable. ACM queue, 12(7), July 23, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. Bailis at al. Highly Available Transactions: Virtues and Limitations, Proceedings of the VLDB Endowment Volume 7 (3): 181--192, November 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. P. Bailis, S. Venkataraman, M. J. Franklin, J. M. Hellerstein, and I. Stoica. Probabilistically Bounded Staleness for Practical Partial Quorums. PVLDB, 5(8):776--787, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. Bailis, A. Fekete, A. Ghodsi, J. M. Hellerstein, and I. Stoica. The potential dangers of causal consistency and an explicit solution. In SOCC 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Basho, RIAK KV, http://basho.com/products/riak-kv/, 2019.Google ScholarGoogle Scholar
  12. D. R. Cheriton, D. Skeen. Understanding the limitations of causal and totally ordered multicast. In Proceedings of the 14th Symposium on Operating System Principles (SOSP '93, Asheville, NC, 1993). 44- 57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. B. F. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H.- A. Jacobsen, N. Puz, D. Weaver, and R. Yerneni. Pnuts: Yahoo!'s hosted data serving platform. Proc. VLDB Endow., 1(2):1277--1288, Aug. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. C. Corbett, J. Dean, M. Epstein et al. Spanner: Google's Globally-Distributed Database, In Proceedings of OSDI 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. G. DeCandia et al. Dynamo: Amazon's Highly Available Key-value Store. SOSP '07 Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles. pp. 205--222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Didona, R. Guerraoui, J. Wang, and W. Zwaenepoel. Causal consistency and latency optimality: Friend or foe?. Technical Report 256091, EPFL, July 2018.Google ScholarGoogle Scholar
  17. J. Du, C. Iorgulescu, A. Roy, and W. Zwaenepoel. Gentlerain: Cheap and scalable causal consistency with physical clocks. In Proceedings of the ACM Symposium on Cloud Computing, SOCC '14, pages 4:1--4:13, New York, NY, USA, 2014. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. J. Fidge. Timestamps in Message-Passing Systems That Preserve the Partial Ordering. Australian Computer Science Communications, 10(1):56--66, February 1988.Google ScholarGoogle Scholar
  19. S. Gilbert, N. Lynch. Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services, ACM SIGACT News, 33(2):51--59, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Kleppmann, "A critique of the CAP theorem," 2015, http:// arxiv.org/abs/1509.05393.Google ScholarGoogle Scholar
  21. S. Kulkarni, M. Demirbas, D. Madappa, B. Avva, and M. Leone. Logical physical clocks. In Principles of Distributed Systems, volume 8878:17--32, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  22. A. Lakshman P. Malik. Cassandra - A Decentralized Structured Storage System ACM SIGOPS Operating Systems Review, 44(2): 35--40, April 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. L. Lamport. Time, clocks, and the ordering of events in a distributed system. Communications of the ACM, 21(7):558--565, July 1978. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. Liu, P. Viotti, C. Cachin et al. XFT: Practical Fault Tolerance Beyond Crashes. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI '16), Savannah, GA, November 2016.Google ScholarGoogle Scholar
  25. Wyatt Lloyd, Michael J Freedman, Michael Kaminsky, and David G Andersen. Don't settle for eventual: Scalable causal consistency for wide-area storage with COPS. In 23rd ACM Symposium on Operating Systems Principles (SOSP), pages 401--416, October 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. P. Mahajan, L. Alvisi, and M. Dahlin. Consistency, availability, convergence. Technical Report TR-11--22, Computer Science Department, UT Austin, May 2011.Google ScholarGoogle Scholar
  27. S. A. Mehdi, C. Littley, N. Crooks, L. Alvisi, N. Bronson, and W. Lloyd. I can't believe it's not causal! Scalable causal consistency with no slowdown cascades. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI '17), pages 453--468, Boston, MA, 2017. USENIX. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. D. Mills. A brief history of ntp time: Memoirs of an internet timekeeper. ACM SIGCOMM Computer Communication Review, 33(2):9--21, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. MongoDB Atlas, https://www.mongodb.com/cloud/atlas, 2019.Google ScholarGoogle Scholar
  30. MongoDB Evergreen: https://evergreen.mongodb.com, 2019.Google ScholarGoogle Scholar
  31. MongoDB Manual 3.6. ReadConcern https://docs.mongodb.com/v3.6/reference/read-concern/.Google ScholarGoogle Scholar
  32. MongoDB Manual 3.6. WriteConcern. https://docs.mongodb.com/v3.6/reference/write-concern/.Google ScholarGoogle Scholar
  33. MongoDB Manual 3.6 Change Streams. https://docs.mongodb.com/v3.6/changeStreams/.Google ScholarGoogle Scholar
  34. MongoDB Manual 3.6 Zone Sharding. https://docs.mongodb.com/v3.6/tutorial/manage-shard-zone/.Google ScholarGoogle Scholar
  35. MongoDB source code: https://github.com/mongodb/mongo/blob/r3.7.2/src/mongo/db/logical_clock.cpp #L114-L158.Google ScholarGoogle Scholar
  36. MongoDB source code: https://github.com/mongodb/mongo/blob/r3.7.2/src/mongo/db/logical_clock.cpp #L99-L112.Google ScholarGoogle Scholar
  37. K. Patella, MongoDB 3.6.4 http://jepsen.io/analyses/mongodb-3--6--4.Google ScholarGoogle Scholar
  38. TPC-C benchmark, http://www.tpc.org/tpcc/, 2019.Google ScholarGoogle Scholar

Index Terms

  1. Implementation of Cluster-wide Logical Clock and Causal Consistency in MongoDB

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGMOD '19: Proceedings of the 2019 International Conference on Management of Data
      June 2019
      2106 pages
      ISBN:9781450356435
      DOI:10.1145/3299869

      Copyright © 2019 Owner/Author

      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 25 June 2019

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      SIGMOD '19 Paper Acceptance Rate88of430submissions,20%Overall Acceptance Rate785of4,003submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader