Abstract
Failure is inevitable. Disks fail. Software bugs lie dormant waiting for just the right conditions to bite. People make mistakes. Data centers are built on farms of unreliable commodity hardware. If you’re running in a cloud environment, then many of these factors are outside of your control. To compound the problem, failure is not predictable and doesn’t occur with uniform probability and frequency. The lack of a uniform frequency increases uncertainty and risk in the system. In the face of such inevitable and unpredictable failure, how can you build a reliable service that provides the high level of availability your users can depend on?
- Robbins, J., Krishnan, K., Allspaw, J., Limoncelli, T. 2012. Resilience engineering: learning to embrace failure. Communications of the ACM55(11): 40-47; http://dx.doi.org/10.1145/2366316.2366331. Google ScholarDigital Library
- Bennett, C. 2012. Edda - Learn the stories of your cloud deployments. The Netflix Tech Blog; http://techblog.netflix.com/2012/11/edda-learn-stories-of-your-cloud.html.Google Scholar
- Bennett, C., Tseitlin, A. 2012. Chaos Monkey released into the wild. The Netflix Tech Blog; http://techblog.netflix.com/2012/07/chaos-monkey-released-into-wild.html.Google Scholar
- Chandra, T. D., Griesemer, R., Redstone, J. 2007. Paxos made live: an engineering perspective. In Proceedings of the 26th Annual ACM Symposium on Principles of Distributed Computing: 398-407; http://labs.google.com/papers/paxos_made_live.pdf. Google ScholarDigital Library
- Izrailevsky, Y., Tseitlin, A. 2011. The Netflix simian army. The Netflix Tech Blog; http://techblog.netflix.com/2011/07/netflix-simian-army.html.Google Scholar
- Sondow, J. 2012. Asgard: Web-based cloud management and deployment. The Netflix Tech Blog; http://techblog.netflix.com/2012/06/asgard-web-based-cloud-management-and.html.Google Scholar
- Strigini, L. 2009. Fault tolerance and resilience: meanings, measures and assessment. London, U.K.: Centre for Software Reliability, City University London; http://www.csr.city.ac.uk/projects/amber/resilienceFTmeasurementv06.pdf.Google Scholar
- Taleb, N. 2012. Antifragile: Things That Gain from Disorder. Random House.Google Scholar
Recommendations
The Semiformal Organization
This paper draws attention to a new dimension of organization, the semiformal organization, and it reveals how the allocation of different membership forms can render knowledge-intensive organizations more flexible and exploratory in their knowledge ...
Technological Accumulation, Diversification and Organisation in UK Companies, 1945-1983
A survey of more than 4,000 significant innovations and innovating firms in the UK from 1945-1983 shows that the scope and organisation of technological activities vary greatly as functions of firms' principal activities and size.
1. Technological ...
External environment, the innovating organization, and its individuals
Minimizing factors (barriers) disrupting innovation is a key to success. Drawing on stakeholder theory and dynamic capabilities, we propose the EOI barrier model for identifying barriers at multiple levels of analysis: the external environment (external ...
Comments