- Abadi, D. Consistency trade-offs in modern distributed database system design: CAP is only part of the story. Computer 45 (2 (2012), 37--42; http://dl.acm.org/citation.cfm?id=2360959. Google ScholarDigital Library
- Amazon Web Services. Summary of the Amazon EC2 and Amazon RDS service disruption in the US East region, 2011; http://aws.amazon.com/message/65648/.Google Scholar
- Bailis, P., Davidson, A., Fekete, A., Ghodsi, A., Hellerstein, J.M. and Stoica, I. Highly available transactions: virtues and limitations. In Proceedings of VLDB 2014 (to appear); http://www.bailis.org/papers/hat-vldb2014.pdf.Google Scholar
- Bailis, P., Fekete, A., Franklin, M.J., Ghodsi, A., Hellerstein, J.M. and Stoica, I. Coordination-avoiding database systems, 2014; http://arxiv.org/abs/1402.2237Google Scholar
- Bailis, P. and Ghodsi, A. Eventual consistency today: Limitations, extensions, and beyond. ACM Queue 11, 3 (2013); http://queue.acm.org/detail.cfm?id=2462076. Google ScholarDigital Library
- CityCloud, 2011; https://www.citycloud.eu/cloud-computing/post-mortem/.Google Scholar
- Davidson, S.B., Garcia-Molina, H. and Skeen, D. Consistency in a partitioned network: A survey. ACM Computing Surveys 17, 3 (1985), 341--370; http://dl.acm.org/citation.cfm?id=5508. Google ScholarDigital Library
- Dwork, C., Lynch, M. and Stockmeyer, L. Consensus in the presence of partial synchrony. JACM 35, 2 (1988); 288--323. http://dl.acm.org/citation.cfm?id=42283. Google ScholarDigital Library
- Fischer, M.J., Lynch, N.A., Patterson, M.S. Impossibility of distributed consensus with one faulty process. JACM 32, 2 (1985), 374--382; http://dl.acm.org/citation.cfm?id=214121 Google ScholarDigital Library
- Fog Creek Software. May 5--6 network maintenance post-mortem; http://status.fogcreek.com/2012/05/may-5-6-network-maintenance-post-mortem.html.Google Scholar
- Gilbert, S. and Lynch, N. Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services. ACM SIGACT News 33, 2 (2002), 51--59; http://dl.acm.org/citation.cfm?id=564601. Google ScholarDigital Library
- Gill, P., Jain, N., Nagappan, N. Understanding network failures in data centers: Measurement, analysis, and implications. In Proceedings of SIGCOMM '11; http://research.microsoft.com/enus/um/people/navendu/papers/sigcomm11netwiser.pdf. Google ScholarDigital Library
- Github. Github availability this week, 2012; https://github.com/blog/1261-github-availability-this-week.Google Scholar
- Kielhofner, K. Packets of death; http://blog.krisk.org/2013/02/packets-of-death.html.Google Scholar
- Lillich, J. Post mortem: Network issues last week; http://www.freistil.it/2013/02/post-mortem-network-issues-last-week/.Google Scholar
- Narayan, P.P.S. Sherpa update, 2010; https://developer.yahoo.com/blogs/ydn/sherpa-7992.html#4.Google Scholar
- Prince, M. Today's outage post mortem, 2013; http://blog.cloudflare.com/todays-outage-post-mortem-82515.Google Scholar
- Turner, D., Levchenko, K., Snoeren, A. and Savage, S. California fault lines: Understanding the causes and impact of network failures. In Proceedings of SIGCOMM '10; http://cseweb.ucsd.edu/~snoeren/papers/cenic-sigcomm10.pdf. Google ScholarDigital Library
- Twilio. Billing incident post-mortem: breakdown, analysis and root cause; http://www.twilio.com/blog/2013/07/billing-incident-post-mortem.html.Google Scholar
Index Terms
- The network is reliable
Recommendations
On Quiescent Reliable Communication
We study the problem of achieving reliable communication with quiescent algorithms (i.e., algorithms that eventually stop sending messages) in asynchronous systems with process crashes and lossy links. We first show that it is impossible to solve this ...
On reliable broadcast in a radio network
PODC '05: Proceedings of the twenty-fourth annual ACM symposium on Principles of distributed computingWe consider the problem of reliable broadcast in an infinite grid (or finite toroidal) radio network under Byzantine and crash-stop failures. We present bounds on the maximum number of failures that may occur in any given neighborhood without rendering ...
Highly reliable message-passing mechanism for cluster file system
With the increase in personal computer clusters in popularity and quantity, message passing between nodes has been an important issue for high failure rate in the network. File access in a cluster file system often contains several sub-operations; each ...
Comments