Abstract
This article describes Google’s implementation of a distributed Cron service, serving the vast majority of internal teams that need periodic scheduling of compute jobs. During its existence, we have learned many lessons on how to design and implement what might seem like a basic service. Here, we discuss the problems that distributed Crons face and outline some potential solutions.
- Burrows, M. 2006. The Chubby lock service for loosely-coupled distributed systems. Proceedings of the 7th Symposium on Operating Systems Design and Implementation: 335-350. http://research.google.com/archive/chubby-osdi06.pdf Google ScholarDigital Library
- Corbett, J. C., et al. 2012. Spanner: Google's globally-distributed database, Proceedings of OSDI'12. Tenth Symposium on Operating System Design and Implementation. http://research.google.com/archive/spanner-osdi2012.pdf Google ScholarDigital Library
- Docker. https://www.docker.com/Google Scholar
- Junqueira, F. P., Reed, B. C., Serafini, M. 2011. Zab: High-performance broadcast for primary-backup systems. Dependable Systems & Networks (DSN), 2011 IEEE/IFIP 41st International Conference: 245-256. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5958223&tag=1 Google ScholarDigital Library
- Lamport, L. 2001. Paxos made simple. ACM SIGACT News 32 (4): 18-25, http://research.microsoft.com/en-us/um/people/lamport/pubs/pubs.html#paxos-simpleGoogle Scholar
- Lamport, L. 2006. Fast Paxos. Distributed Computing 19 (2): 79-103, http://research.microsoft.com/pubs/64624/tr-2005-112.pdfGoogle ScholarDigital Library
- Ongaro, D., Ousterhout, J. 2014. In search of an understandable consensus algorithm (extended version). https://ramcloud.stanford.edu/raft.pdf Google ScholarDigital Library
Index Terms
- Reliable Cron across the Planet: ...or How I stopped worrying and learned to love time
Comments