skip to main content
research-article
Free Access

The Verification of a Distributed System: A practitioner’s guide to increasing confidence in system correctness

Published:01 December 2015Publication History
Skip Abstract Section

Abstract

Leslie Lamport, known for his seminal work in distributed systems, famously said, "A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable." Given this bleak outlook and the large set of possible failures, how do you even begin to verify and validate that the distributed systems you build are doing the right thing?

References

  1. Alvaro, P., Rosen, J., Hellerstein, J. M. 2015. Lineage-driven fault injection; http://www.cs.berkeley.edu/~palvaro/molly.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Aniszczyk, C. 2012. Distributed systems tracing with Zipkin; https://blog.twitter.com/2012/distributed-systems-tracing-with-zipkin.Google ScholarGoogle Scholar
  3. Aphyr. Jepsen; https://aphyr.com/tags/Jepsen.Google ScholarGoogle Scholar
  4. Hedlund, M. 2014. Game day exercises at Stripe: learning from "kill -9"; https://stripe.com/blog/game-day-exercises-at-stripe.Google ScholarGoogle Scholar
  5. Izrailevsky, Y., Tseitlin, A. 2011. The Netflix Simian Army; http://techblog.netflix.com/2011/07/netflix-simian-army.html.Google ScholarGoogle Scholar
  6. Killian, C., Anderson, J. W., Jhala, R., Vahdat, A. 2006. Life, death, and the critical transition: finding liveness bugs in system code; http://www.macesystems.org/papers/MaceMC_TR.pdf.Google ScholarGoogle Scholar
  7. Lamport, L., Yu, Y. 2011. TLC the TLA+ model checker; http://research.microsoft.com/en-us/um/people/lamport/tla/tlc.html.Google ScholarGoogle Scholar
  8. Newcomb, C., Rath, T., Zhang, F., Munteanu, B., Brooker, M., Deardeuff, M. 2014. How Amazon Web Services uses formal methods. 2015. Communications of the ACM 58(4): 68-73; http://cacm.acm.org/magazines/2015/4/184701-how-amazon-web-services-uses-formal-methods/fulltext. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. QuickCheck; https://hackage.haskell.org/package/QuickCheck.Google ScholarGoogle Scholar
  10. Sheehy, J. 2015. There is no now. ACM Queue 13(3); https://queue.acm.org/detail.cfm?id=2745385. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Spin; http://spinroot.com/spin/whatispin.html.Google ScholarGoogle Scholar
  12. Sigelman, B. H., Barroso, L. S., Burrows, M., Stephenson, P., Plakal, M., Beaver, D., Jaspan, S., Shanbhag, C. 2010. Dapper, a large-scale distributed systems tracing infrastructure; http://research.google.com/pubs/pub36356.html.Google ScholarGoogle Scholar
  13. Thompson, A. 2012. QuickChecking poolboy for fun and profit; http://basho.com/posts/technical/quickchecking-poolboy-for-fun-and-profit/.Google ScholarGoogle Scholar
  14. Yang, J., Chen, T., Wu, M., Xu, Z., Liu, X., Lin, H., Yang, M., Long, F., Zhang, L., Zhou, L. 2009. MODIST: transparent model checking of unmodified distributed systems; https://www.usenix.org/legacy/events/nsdi09/tech/full_papers/yang/yang_html/index.html. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Yuan, D., Luo, Y., Zhuang, X., Rodrigues, G. R., Zhao, X., Zhang, Y., Jain, P. U., Stumm, M. 2014. Simple testing can prevent most critical failures: an analysis of production failures in distributed data-intensive systems; https://www.usenix.org/conference/osdi14/technical-sessions/presentation/yuan. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Wilcox, J. R., Woos, D., Panchekha, P., Tatlock, Z., Wang, X., Ernst, M. D., Anderson, T. 2015. Verdi: a framework for implementing and formally verifying distributed systems. Proceedings of the ACM SIGPLAN 2015 Conference on Programming Language Design and Implementation: 357-368; https://homes.cs.washington.edu/~mernst/pubs/verify-distsystem-pldi2015-abstract.html. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. The Verification of a Distributed System: A practitioner’s guide to increasing confidence in system correctness
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image Queue
              Queue  Volume 13, Issue 9
              Structured Data
              November-December 2015
              156 pages
              ISSN:1542-7730
              EISSN:1542-7749
              DOI:10.1145/2857274
              Issue’s Table of Contents

              Copyright © 2015 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 1 December 2015

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Popular
              • Editor picked

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            HTML Format

            View this article in HTML Format .

            View HTML Format