ABSTRACT
Amazon Aurora is a relational database service for OLTP workloads offered as part of Amazon Web Services (AWS). In this paper, we describe the architecture of Aurora and the design considerations leading to that architecture. We believe the central constraint in high throughput data processing has moved from compute and storage to the network. Aurora brings a novel architecture to the relational database to address this constraint, most notably by pushing redo processing to a multi-tenant scale-out storage service, purpose-built for Aurora. We describe how doing so not only reduces network traffic, but also allows for fast crash recovery, failovers to replicas without loss of data, and fault-tolerant, self-healing storage. We then describe how Aurora achieves consensus on durable state across numerous storage nodes using an efficient asynchronous scheme, avoiding expensive and chatty recovery protocols. Finally, having operated Aurora as a production service for over 18 months, we share the lessons we have learnt from our customers on what modern cloud applications expect from databases.
- B. Calder, J. Wang, et al. Windows Azure storage: A highly available cloud storage service with strong consistency. In SOSP 201 Google ScholarDigital Library
- O. Khan, R. Burns, J. Plank, W. Pierce, and C. Huang. Rethinking erasure codes for cloud file systems: Minimizing I/O for recovery and degraded reads. In FAST 2012. Google ScholarDigital Library
- P.A. Bernstein, V. Hadzilacos, and N. Goodman. Concurrency control and recovery in database systems, Chapter 7, Addison Wesley Publishing Company, ISBN 0-201-10715-5, 1997. Google ScholarDigital Library
- C. Mohan, B. Lindsay, and R. Obermarck. Transaction management in the R* distributed database management system?. ACM TODS, 11(4):378--396, 1986. Google ScholarDigital Library
- C. Mohan and B. Lindsay. Efficient commit protocols for the tree of processes model of distributed transactions. ACM SIGOPS Operating Systems Review, 19(2):40--52, 1985. Google ScholarDigital Library
- D.K. Gifford. Weighted voting for replicated data. In SOSP 1979. Google ScholarDigital Library
- C. Mohan, D.L. Haderle, B. Lindsay, H. Pirahesh, and P. Schwarz. ARIES: A transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM TODS, 17 (1): 94--162, 1992. Google ScholarDigital Library
- R. van Renesse and F. Schneider. Chain replication for supporting high throughput and availability. In OSDI 2004. Google ScholarDigital Library
- A. Kopytov. Sysbench Manual. Available at http://imysql.com/wp-content/uploads/2014/10/sysbench-manual.pdfGoogle Scholar
- J. Levandoski, D. Lomet, S. Sengupta, R. Stutsman, and R. Wang. High performance transactions in deuteronomy. In CIDR 2015.Google Scholar
- P. Bailis, A. Fekete, A. Ghodsi, J.M. Hellerstein, and I. Stoica. Scalable atomic visibility with RAMP Transactions. In SIGMOD 2014. Google ScholarDigital Library
- P. Bailis, A. Davidson, A. Fekete, A. Ghodsi, J.M. Hellerstein, and I. Stoica. Highly available transactions: virtues and limitations. In VLDB 2014. Google ScholarDigital Library
- R. Taft, E. Mansour, M. Serafini, J. Duggan, A.J. Elmore, A. Aboulnaga, A. Pavlo, and M. Stonebraker. E-Store: fine-grained elastic partitioning for distributed transaction processing systems. In VLDB 2015. Google ScholarDigital Library
- R. Woollen. The internal design of salesforce.com's multi-tenant architecture. In SoCC 2010. Google ScholarDigital Library
- S. Davidson, H. Garcia-Molina, and D. Skeen. Consistency in partitioned networks. ACM CSUR, 17(3):341--370, 1985. Google ScholarDigital Library
- S. Gilbert and N. Lynch. Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services. SIGACT News, 33(2):51--59, 2002. Google ScholarDigital Library
- D.J. Abadi. Consistency tradeoffs in modern distributed database system design: CAP is only part of the story. IEEE Computer, 45(2), 2012. Google ScholarDigital Library
- A. Adya. Weak consistency: a generalized theory and optimistic implementations for distributed transactions. PhD Thesis, MIT, 1999. Google ScholarDigital Library
- Y. Saito and M. Shapiro. Optimistic replication. ACM Comput. Surv., 37(1), Mar. 2005. Google ScholarDigital Library
- H. Berenson, P. Bernstein, J. Gray, J. Melton, E. O'Neil, and P. O'Neil. A critique of ANSI SQL isolation levels. In SIGMOD 1995. Google ScholarDigital Library
- P. Bailis and A. Ghodsi. Eventual consistency today: limitations, extensions, and beyond. ACM Queue, 11(3), March 2013. Google ScholarDigital Library
- P. Bernstein and S. Das. Rethinking eventual consistency. In SIGMOD, 2013. Google ScholarDigital Library
- B. Cooper et al. PNUTS: Yahoo!'s hosted data serving platform. In VLDB 2008. Google ScholarDigital Library
- J. C. Corbett, J. Dean, et al. Spanner: Google's globally-distributed database. In OSDI 2012. Google ScholarDigital Library
- David K. Gifford. Information Storage in a Decentralized Computer System. Tech. rep. CSL-81--8. PhD dissertation. Xerox PARC, July 1982. Google ScholarDigital Library
- Jeffrey Dean and Sanjay Ghemawat. MapReduce: a flexible data processing tool?. CACM 53 (1):72--77, 2010. Google ScholarDigital Library
- J. M. Hellerstein, M. Stonebraker, and J. R. Hamilton. Architecture of a database system. Foundations and Trends in Databases. 1(2) pp. 141--259, 2007. Google ScholarDigital Library
- J. Gray, R. A. Lorie, G. R. Putzolu, I. L. Traiger. Granularity of locks in a shared data base. In VLDB 1975. Google ScholarDigital Library
- P-A Larson, et al. High-Performance Concurrency control mechanisms for main-memory databases. PVLDB, 5(4): 298--309, 2011. Google ScholarDigital Library
- M. Stonebraker and A. Weisberg. The VoltDB main memory DBMS. IEEE Data Eng. Bull., 36(2): 21--27, 2013.Google Scholar
- V. Leis, A. Kemper, and T. Neumann. Exploiting hardware transactional memory in main-memory databases. In ICDE 2014.Google ScholarCross Ref
- H. Mühe, S. Wolf, A. Kemper, and T. Neumann: An evaluation of strict timestamp ordering concurrency control for main-memory database systems. In IMDM Workshop 2013.Google Scholar
- M. Rosenblum and J. Ousterhout. The design and implementation of a log-structured file system. ACM TOCS 10(1): 26--52, 1992. Google ScholarDigital Library
- J. Levandoski, D. Lomet, S. Sengupta. LLAMA: A cache/storage subsystem for modern hardware. PVLDB 6(10): 877--888, 2013. Google ScholarDigital Library
- J. Levandoski, D. Lomet, and S. Sengupta. The Bw-Tree: A B-tree for new hardware platforms. In ICDE 2013. Google ScholarDigital Library
- M. Aguilera, J. Leners, and M. Walfish. Yesquel: scalable SQL storage for web applications. In SOSP 2015. Google ScholarDigital Library
- Percona Lab. TPC-C Benchmark over MySQL. Available at https://github.com/Percona-Lab/tpcc-mysqlGoogle Scholar
- P. Bernstein, C. Reid, and S. Das. Hyder -- A transactional record manager for shared flash. In CIDR 2011.Google Scholar
- M. Aguilera, A. Merchant, M. Shah, A. Veitch, and C. Karamanolis. Sinfonia: A new paradigm for building scalable distributed systems. ACM Trans. Comput. Syst. 27(3): 2009. Google ScholarDigital Library
- M. Weiner. Sharding Pinterest: How we scaled our MySQL fleet. Pinterest Engineering Blog. Available at: https://engineering.pinterest.com/blog/sharding-pinterest-how-we-scaled-our-mysql-fleetGoogle Scholar
- G. Graefe. Instant recovery for data center savings. ACM SIGMOD Record. 44(2):29--34, 2015. Google ScholarDigital Library
- J. Dean and L. Barroso. The tail at scale. CACM 56(2):74--80, 2013. Google ScholarDigital Library
Index Terms
- Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases
Recommendations
Amazon Aurora: On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes
SIGMOD '18: Proceedings of the 2018 International Conference on Management of DataAmazon Aurora is a high-throughput cloud-native relational database offered as part of Amazon Web Services (AWS). One of the more novel differences between Aurora and other relational databases is how it pushes redo processing to a multi-tenant scale-...
Benchmarking OLTP/web databases in the cloud: the OLTP-bench framework
CloudDB '12: Proceedings of the fourth international workshop on Cloud data managementBenchmarking is a key activity in building and tuning data management systems, but the lack of reference workloads and a common platform makes it a time consuming and painful task. The need for such a tool is heightened with the advent of cloud ...
Automated control for SLA-aware elastic clouds
FeBiD '10: Proceedings of the Fifth International Workshop on Feedback Control Implementation and Design in Computing Systems and NetworksAlthough Cloud Computing provides a means to support remote, on-demand access top a set of computing resources, its ad-hoc management for quality-of-service and SLA poses significant challenges to the performance, availability and economical costs of ...
Comments