Abstract
Spanner is Google’s scalable, multiversion, globally distributed, and synchronously replicated database. It is the first system to distribute data at global scale and support externally-consistent distributed transactions. This article describes how Spanner is structured, its feature set, the rationale underlying various design decisions, and a novel time API that exposes clock uncertainty. This API and its implementation are critical to supporting external consistency and a variety of powerful features: nonblocking reads in the past, lock-free snapshot transactions, and atomic schema changes, across all of Spanner.
- Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D., Silberschatz, A., and Rasin, A. 2009. Hadoopdb: An architectural hybrid of mapreduce and DBMS technologies for analytical workloads. In Proceedings of the International Conference on Very Large Data Bases. 922--933.Google Scholar
- Adya, A., Gruber, R., Liskov, B., and Maheshwari, U. 1995. Efficient optimistic concurrency control using loosely synchronized clocks. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 23--34. Google ScholarDigital Library
- Amazon. 2012. Amazon dynamodb. http://aws.amazon.com/dynamodb.Google Scholar
- Armbrust, M., Curtis, K., Kraska, T., Fox, A., Franklin, M., and Patterson, D. 2011. PIQL: Success-tolerant query processing in the cloud. In Proceedings of the International Conference on Very Large Data Bases. 181--192. Google ScholarDigital Library
- Baker, J., Bond, C., Corbett, J. C., Furman, J., Khorlin, A., Larson, J., Léon, J.-M., Li, Y., Lloyd, A., and Yushprakh, V. 2011. Megastore: Providing scalable, highly available storage for interactive services. In Proceedings of CIDR. 223--234.Google Scholar
- Berenson, H., Bernstein, P., Gray, J., Melton, J., O’Neil, E., and O’Neil, P. 1995. A critique of ANSI SQL isolation levels. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 1--10. Google ScholarDigital Library
- Brantner, M., Florescu, D., Graf, D., Kossmann, D., and Kraska, T. 2008. Building a database on S3. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 251--264. Google ScholarDigital Library
- Chan, A. and Gray, R. 1985. Implementing distributed read-only transactions. IEEE Trans. Softw. Eng. SE-11, 2, 205--212. Google ScholarDigital Library
- Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., Chandra, T., Fikes, A., and Gruber, R. E. 2008. Bigtable: A distributed storage system for structured data. ACM Trans. Comput. Syst. 26, 2, 4:1--4:26. Google ScholarDigital Library
- Cooper, B. F., Ramakrishnan, R., Srivastava, U., Silberstein, A., Bohannon, P., Jacobsen, H.-A., Puz, N., Weaver, D., and Yerneni, R. 2008. PNUTS: Yahoo!’s hosted data serving platform. In Proceedings of the International Conference on Very Large Data Bases. 1277--1288. Google ScholarDigital Library
- Cowling, J. and Liskov, B. 2012. Granola: Low-overhead distributed transaction coordination. In Proceedings of USENIX ATC. 223--236. Google ScholarDigital Library
- Dean, J. and Ghemawat, S. 2010. MapReduce: A flexible data processing tool. Comm. ACM 53, 1, 72--77. Google ScholarDigital Library
- Douceur, J. and Howell, J. 2003. Scalable Byzantine-fault-quantifying clock synchronization. Tech. rep. MSR-TR-2003-67, MS Research.Google Scholar
- Douceur, J. R. and Howell, J. 2006. Distributed directory service in the Farsite file system. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation. 321--334. Google ScholarDigital Library
- Ghemawat, S., Gobioff, H., and Leung, S.-T. 2003. The Google file system. In Proceedings of the Symposium on Operating Systems Principles. 29--43. Google ScholarDigital Library
- Gifford, D. K. 1982. Information storage in a decentralized computer system. Tech. rep. CSL-81-8, Xerox PARC.Google Scholar
- Glendenning, L., Beschastnikh, I., Krishnamurthy, A., and Anderson, T. 2011. Scalable consistency in scatter. In Proceedings of the Symposium on Operating Systems Principles. Google ScholarDigital Library
- Google. 2008. Protocol buffers --- Google’s data interchange format. https://code.google.com/p/protobuf.Google Scholar
- Gray, J. and Lamport, L. 2006. Consensus on transaction commit. ACM Trans. Datab. Syst. 31, 1, 133--160. Google ScholarDigital Library
- Helland, P. 2007. Life beyond distributed transactions: An apostate’s opinion. In Proceedings of the Biennial Conference on Innovative Data Systems Research. 132--141.Google Scholar
- Herlihy, M. P. and Wing, J. M. 1990. Linearizability: A correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst. 12, 3, 463--492. Google ScholarDigital Library
- Lamport, L. 1998. The part-time parliament. ACM Trans. Comput. Syst. 16, 2, 133--169. Google ScholarDigital Library
- Lamport, L., Malkhi, D., and Zhou, L. 2010. Reconfiguring a state machine. SIGACT News 41, 1, 63--73. Google ScholarDigital Library
- Liskov, B. 1993. Practical uses of synchronized clocks in distributed systems. Distrib. Comput. 6, 4, 211--219. Google ScholarDigital Library
- Lomet, D. B. and Li, F. 2009. Improving transaction-time DBMS performance and functionality. In Proceedings of the International Conference on Data Engineering. 581--591. Google ScholarDigital Library
- Lorch, J. R., Adya, A., Bolosky, W. J., Chaiken, R., Douceur, J. R., and Howell, J. 2006. The SMART way to migrate replicated stateful services. In Proceedings of EuroSys. 103--115. Google ScholarDigital Library
- MarkLogic. 2012. Marklogic 5 product documentation. http://community.marklogic.com/docs.Google Scholar
- Marzullo, K. and Owicki, S. 1983. Maintaining the time in a distributed system. In Proceedings of the Symposium on Principles of Distributed Computing. 295--305. Google ScholarDigital Library
- Melnik, S., Gubarev, A., Long, J. J., Romer, G., Shivakumar, S., Tolton, M., and Vassilakis, T. 2010. Dremel: Interactive analysis of Web-scale datasets. In Proceedings of the International Conference on Very Large Data Bases. 330--339.Google Scholar
- Mills, D. 1981. Time synchronization in DCNET hosts. Internet project rep. IEN--173, COMSAT Laboratories.Google Scholar
- Oracle. 2012. Oracle total recall. http://www.oracle.com/technetwork/database/focus-areas/storage/total-recall-whitepaper-171749.pdf.Google Scholar
- Pavlo, A., Paulson, E., Rasin, A., Abadi, D. J., DeWitt, D. J., Madden, S., and Stonebraker, M. 2009. A comparison of approaches to large-scale data analysis. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 165--178. Google ScholarDigital Library
- Peng, D. and Dabek, F. 2010. Large-scale incremental processing using distributed transactions and notifications. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation. 1--15. Google ScholarDigital Library
- Rosenkrantz, D. J., Stearns, R. E., and Lewis II, P. M. 1978. System level concurrency control for distributed database systems. ACM Trans. Datab. Syst. 3, 2, 178--198. Google ScholarDigital Library
- Shraer, A., Reed, B., Malkhi, D., and Junqueiera, F. 2012. Dynamic reconfiguration of primary/backup clusters. In Proceedings of USENIX ATC. 425--438. Google ScholarDigital Library
- Shute, J., Oancea, M., Ellner, S., Handy, B., Rollins, E., Samwel, B., Vingralek, R., Whipkey, C., Chen, X., Jegerlehner, B., Littlefield, K., and Tong, P. 2012. F1 --- The fault-tolerant distributed RDBMS supporting Google’s ad business. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 777--778. Google ScholarDigital Library
- Sovran, Y., Power, R., Aguilera, M. K., and Li, J. 2011. Transactional storage for geo-replicated systems. In Proceedings of the Symposium on Operating Systems Principles. 385--400. Google ScholarDigital Library
- Stonebraker, M. 2010a. Six SQL urban myths. http://highscalability.com/blog/2010/6/28/voltdb-decapitates -six-sql-urban-myths-and-delivers-internet.html.Google Scholar
- Stonebraker, M. 2010b. Why enterprises are uninterested in NoSQL. http://cacm.acm.org/blogs/blog-cacm/99512-why-enterprises-are-uninterested-in-nosql/fulltext.Google Scholar
- Stonebraker, M., Madden, S., Abadi, D. J., Harizopoulos, S., Hachem, N., and Helland, P. 2007. The end of an architectural era: (It’s time for a complete rewrite). In Proceedings of the International Conference on Very Large Data Bases. 1150--1160. Google ScholarDigital Library
- Thomson, A., Diamond, T., Shu-Chun Weng, T. D., Ren, K., Shao, P., and Abadi, D. J. 2012. Calvin: Fast distributed transactions for partitioned database systems. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 1--12. Google ScholarDigital Library
- Thusoo, A., Sarma, J. S., Jain, N., Shao, Z., Chakka, P., Zhang, N., Antony, S., Liu, H., and Murthy, R. 2010. Hive --- A petabyte scale data warehouse using Hadoop. In Proceedings of the International Conference on Data Engineering. 996--1005.Google Scholar
- VoltDB. 2012. VoltDB resources. http://voltdb.com/resources/whitepapers.Google Scholar
Index Terms
- Spanner: Google’s Globally Distributed Database
Recommendations
Distributed lock management for mobile transactions
ICDCS '95: Proceedings of the 15th International Conference on Distributed Computing SystemsAbstract: We present a new lock management scheme which allows a read unlock for an item to be executed at any copy site of that item; the site may be different from the copy site on which the read lock is set. The scheme utilizes the replicated copies ...
A Non-Two-Phase Locking Protocol for Global Concurrency Control in Distributed Heterogeneous Database Systems
A concurrency control method is proposed for global transactions in a distributed heterogeneous database system. This method is applicable when the database sites are interconnected in a rooted tree fashion. It guarantees deadlock freedom in addition to ...
Concurrency Control in Distributed Databases Through Time Intervals and Short-Term Locks
A method for concurrency control in distributed database management systems that increases the level of concurrent execution of transactions, called ordering by serialization numbers (OSN), is proposed. The OSN method works in the certifier model and ...
Comments