Abstract
Leslie Lamport, known for his seminal work in distributed systems, famously said, "A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable." Given this bleak outlook and the large set of possible failures, how do you even begin to verify and validate that the distributed systems you build are doing the right thing?
- Alvaro, P., Rosen, J., Hellerstein, J. M. 2015. Lineage-driven fault injection; http://www.cs.berkeley.edu/~palvaro/molly.pdf. Google ScholarDigital Library
- Aniszczyk, C. 2012. Distributed systems tracing with Zipkin; https://blog.twitter.com/2012/distributed-systems-tracing-with-zipkin.Google Scholar
- Aphyr. Jepsen; https://aphyr.com/tags/Jepsen.Google Scholar
- Hedlund, M. 2014. Game day exercises at Stripe: learning from "kill -9"; https://stripe.com/blog/game-day-exercises-at-stripe.Google Scholar
- Izrailevsky, Y., Tseitlin, A. 2011. The Netflix Simian Army; http://techblog.netflix.com/2011/07/netflix-simian-army.html.Google Scholar
- Killian, C., Anderson, J. W., Jhala, R., Vahdat, A. 2006. Life, death, and the critical transition: finding liveness bugs in system code; http://www.macesystems.org/papers/MaceMC_TR.pdf.Google Scholar
- Lamport, L., Yu, Y. 2011. TLC the TLA+ model checker; http://research.microsoft.com/en-us/um/people/lamport/tla/tlc.html.Google Scholar
- Newcomb, C., Rath, T., Zhang, F., Munteanu, B., Brooker, M., Deardeuff, M. 2014. How Amazon Web Services uses formal methods. 2015. Communications of the ACM 58(4): 68-73; http://cacm.acm.org/magazines/2015/4/184701-how-amazon-web-services-uses-formal-methods/fulltext. Google ScholarDigital Library
- QuickCheck; https://hackage.haskell.org/package/QuickCheck.Google Scholar
- Sheehy, J. 2015. There is no now. ACM Queue 13(3); https://queue.acm.org/detail.cfm?id=2745385. Google ScholarDigital Library
- Spin; http://spinroot.com/spin/whatispin.html.Google Scholar
- Sigelman, B. H., Barroso, L. S., Burrows, M., Stephenson, P., Plakal, M., Beaver, D., Jaspan, S., Shanbhag, C. 2010. Dapper, a large-scale distributed systems tracing infrastructure; http://research.google.com/pubs/pub36356.html.Google Scholar
- Thompson, A. 2012. QuickChecking poolboy for fun and profit; http://basho.com/posts/technical/quickchecking-poolboy-for-fun-and-profit/.Google Scholar
- Yang, J., Chen, T., Wu, M., Xu, Z., Liu, X., Lin, H., Yang, M., Long, F., Zhang, L., Zhou, L. 2009. MODIST: transparent model checking of unmodified distributed systems; https://www.usenix.org/legacy/events/nsdi09/tech/full_papers/yang/yang_html/index.html. Google ScholarDigital Library
- Yuan, D., Luo, Y., Zhuang, X., Rodrigues, G. R., Zhao, X., Zhang, Y., Jain, P. U., Stumm, M. 2014. Simple testing can prevent most critical failures: an analysis of production failures in distributed data-intensive systems; https://www.usenix.org/conference/osdi14/technical-sessions/presentation/yuan. Google ScholarDigital Library
- Wilcox, J. R., Woos, D., Panchekha, P., Tatlock, Z., Wang, X., Ernst, M. D., Anderson, T. 2015. Verdi: a framework for implementing and formally verifying distributed systems. Proceedings of the ACM SIGPLAN 2015 Conference on Programming Language Design and Implementation: 357-368; https://homes.cs.washington.edu/~mernst/pubs/verify-distsystem-pldi2015-abstract.html. Google ScholarDigital Library
Index Terms
- The Verification of a Distributed System: A practitioner’s guide to increasing confidence in system correctness
Recommendations
Formal verification of ASMs using MDGs
We present a framework for the formal verification of abstract state machine (ASM) designs using the multiway decision graphs (MDG) tool. ASM is a state based language for describing transition systems. MDG provides symbolic representation of transition ...
Comments