skip to main content
research-article
Open Access

Spanner: Google’s Globally Distributed Database

Published:01 August 2013Publication History
Skip Abstract Section

Abstract

Spanner is Google’s scalable, multiversion, globally distributed, and synchronously replicated database. It is the first system to distribute data at global scale and support externally-consistent distributed transactions. This article describes how Spanner is structured, its feature set, the rationale underlying various design decisions, and a novel time API that exposes clock uncertainty. This API and its implementation are critical to supporting external consistency and a variety of powerful features: nonblocking reads in the past, lock-free snapshot transactions, and atomic schema changes, across all of Spanner.

References

  1. Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D., Silberschatz, A., and Rasin, A. 2009. Hadoopdb: An architectural hybrid of mapreduce and DBMS technologies for analytical workloads. In Proceedings of the International Conference on Very Large Data Bases. 922--933.Google ScholarGoogle Scholar
  2. Adya, A., Gruber, R., Liskov, B., and Maheshwari, U. 1995. Efficient optimistic concurrency control using loosely synchronized clocks. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 23--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Amazon. 2012. Amazon dynamodb. http://aws.amazon.com/dynamodb.Google ScholarGoogle Scholar
  4. Armbrust, M., Curtis, K., Kraska, T., Fox, A., Franklin, M., and Patterson, D. 2011. PIQL: Success-tolerant query processing in the cloud. In Proceedings of the International Conference on Very Large Data Bases. 181--192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Baker, J., Bond, C., Corbett, J. C., Furman, J., Khorlin, A., Larson, J., Léon, J.-M., Li, Y., Lloyd, A., and Yushprakh, V. 2011. Megastore: Providing scalable, highly available storage for interactive services. In Proceedings of CIDR. 223--234.Google ScholarGoogle Scholar
  6. Berenson, H., Bernstein, P., Gray, J., Melton, J., O’Neil, E., and O’Neil, P. 1995. A critique of ANSI SQL isolation levels. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Brantner, M., Florescu, D., Graf, D., Kossmann, D., and Kraska, T. 2008. Building a database on S3. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 251--264. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Chan, A. and Gray, R. 1985. Implementing distributed read-only transactions. IEEE Trans. Softw. Eng. SE-11, 2, 205--212. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., Chandra, T., Fikes, A., and Gruber, R. E. 2008. Bigtable: A distributed storage system for structured data. ACM Trans. Comput. Syst. 26, 2, 4:1--4:26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Cooper, B. F., Ramakrishnan, R., Srivastava, U., Silberstein, A., Bohannon, P., Jacobsen, H.-A., Puz, N., Weaver, D., and Yerneni, R. 2008. PNUTS: Yahoo!’s hosted data serving platform. In Proceedings of the International Conference on Very Large Data Bases. 1277--1288. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Cowling, J. and Liskov, B. 2012. Granola: Low-overhead distributed transaction coordination. In Proceedings of USENIX ATC. 223--236. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Dean, J. and Ghemawat, S. 2010. MapReduce: A flexible data processing tool. Comm. ACM 53, 1, 72--77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Douceur, J. and Howell, J. 2003. Scalable Byzantine-fault-quantifying clock synchronization. Tech. rep. MSR-TR-2003-67, MS Research.Google ScholarGoogle Scholar
  14. Douceur, J. R. and Howell, J. 2006. Distributed directory service in the Farsite file system. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation. 321--334. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Ghemawat, S., Gobioff, H., and Leung, S.-T. 2003. The Google file system. In Proceedings of the Symposium on Operating Systems Principles. 29--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Gifford, D. K. 1982. Information storage in a decentralized computer system. Tech. rep. CSL-81-8, Xerox PARC.Google ScholarGoogle Scholar
  17. Glendenning, L., Beschastnikh, I., Krishnamurthy, A., and Anderson, T. 2011. Scalable consistency in scatter. In Proceedings of the Symposium on Operating Systems Principles. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Google. 2008. Protocol buffers --- Google’s data interchange format. https://code.google.com/p/protobuf.Google ScholarGoogle Scholar
  19. Gray, J. and Lamport, L. 2006. Consensus on transaction commit. ACM Trans. Datab. Syst. 31, 1, 133--160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Helland, P. 2007. Life beyond distributed transactions: An apostate’s opinion. In Proceedings of the Biennial Conference on Innovative Data Systems Research. 132--141.Google ScholarGoogle Scholar
  21. Herlihy, M. P. and Wing, J. M. 1990. Linearizability: A correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst. 12, 3, 463--492. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Lamport, L. 1998. The part-time parliament. ACM Trans. Comput. Syst. 16, 2, 133--169. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Lamport, L., Malkhi, D., and Zhou, L. 2010. Reconfiguring a state machine. SIGACT News 41, 1, 63--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Liskov, B. 1993. Practical uses of synchronized clocks in distributed systems. Distrib. Comput. 6, 4, 211--219. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Lomet, D. B. and Li, F. 2009. Improving transaction-time DBMS performance and functionality. In Proceedings of the International Conference on Data Engineering. 581--591. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Lorch, J. R., Adya, A., Bolosky, W. J., Chaiken, R., Douceur, J. R., and Howell, J. 2006. The SMART way to migrate replicated stateful services. In Proceedings of EuroSys. 103--115. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. MarkLogic. 2012. Marklogic 5 product documentation. http://community.marklogic.com/docs.Google ScholarGoogle Scholar
  28. Marzullo, K. and Owicki, S. 1983. Maintaining the time in a distributed system. In Proceedings of the Symposium on Principles of Distributed Computing. 295--305. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Melnik, S., Gubarev, A., Long, J. J., Romer, G., Shivakumar, S., Tolton, M., and Vassilakis, T. 2010. Dremel: Interactive analysis of Web-scale datasets. In Proceedings of the International Conference on Very Large Data Bases. 330--339.Google ScholarGoogle Scholar
  30. Mills, D. 1981. Time synchronization in DCNET hosts. Internet project rep. IEN--173, COMSAT Laboratories.Google ScholarGoogle Scholar
  31. Oracle. 2012. Oracle total recall. http://www.oracle.com/technetwork/database/focus-areas/storage/total-recall-whitepaper-171749.pdf.Google ScholarGoogle Scholar
  32. Pavlo, A., Paulson, E., Rasin, A., Abadi, D. J., DeWitt, D. J., Madden, S., and Stonebraker, M. 2009. A comparison of approaches to large-scale data analysis. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 165--178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Peng, D. and Dabek, F. 2010. Large-scale incremental processing using distributed transactions and notifications. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation. 1--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Rosenkrantz, D. J., Stearns, R. E., and Lewis II, P. M. 1978. System level concurrency control for distributed database systems. ACM Trans. Datab. Syst. 3, 2, 178--198. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Shraer, A., Reed, B., Malkhi, D., and Junqueiera, F. 2012. Dynamic reconfiguration of primary/backup clusters. In Proceedings of USENIX ATC. 425--438. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Shute, J., Oancea, M., Ellner, S., Handy, B., Rollins, E., Samwel, B., Vingralek, R., Whipkey, C., Chen, X., Jegerlehner, B., Littlefield, K., and Tong, P. 2012. F1 --- The fault-tolerant distributed RDBMS supporting Google’s ad business. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 777--778. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Sovran, Y., Power, R., Aguilera, M. K., and Li, J. 2011. Transactional storage for geo-replicated systems. In Proceedings of the Symposium on Operating Systems Principles. 385--400. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Stonebraker, M. 2010a. Six SQL urban myths. http://highscalability.com/blog/2010/6/28/voltdb-decapitates -six-sql-urban-myths-and-delivers-internet.html.Google ScholarGoogle Scholar
  39. Stonebraker, M. 2010b. Why enterprises are uninterested in NoSQL. http://cacm.acm.org/blogs/blog-cacm/99512-why-enterprises-are-uninterested-in-nosql/fulltext.Google ScholarGoogle Scholar
  40. Stonebraker, M., Madden, S., Abadi, D. J., Harizopoulos, S., Hachem, N., and Helland, P. 2007. The end of an architectural era: (It’s time for a complete rewrite). In Proceedings of the International Conference on Very Large Data Bases. 1150--1160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Thomson, A., Diamond, T., Shu-Chun Weng, T. D., Ren, K., Shao, P., and Abadi, D. J. 2012. Calvin: Fast distributed transactions for partitioned database systems. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Thusoo, A., Sarma, J. S., Jain, N., Shao, Z., Chakka, P., Zhang, N., Antony, S., Liu, H., and Murthy, R. 2010. Hive --- A petabyte scale data warehouse using Hadoop. In Proceedings of the International Conference on Data Engineering. 996--1005.Google ScholarGoogle Scholar
  43. VoltDB. 2012. VoltDB resources. http://voltdb.com/resources/whitepapers.Google ScholarGoogle Scholar

Index Terms

  1. Spanner: Google’s Globally Distributed Database

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Computer Systems
            ACM Transactions on Computer Systems  Volume 31, Issue 3
            August 2013
            94 pages
            ISSN:0734-2071
            EISSN:1557-7333
            DOI:10.1145/2518037
            Issue’s Table of Contents

            Copyright © 2013 Owner/Author

            Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 1 August 2013
            • Accepted: 1 May 2013
            • Received: 1 January 2013
            Published in tocs Volume 31, Issue 3

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader