skip to main content
research-article
Public Access

Debugging distributed systems

Published:22 July 2016Publication History
Skip Abstract Section

Abstract

ShiViz is a new distributed system debugging visualization tool.

References

  1. Bernstein, P., Hadzilacos, V., Goodman, N. Distributed recovery. Concurrency Control and Recovery in Database Systems, Chapter 7. Addison-Wesley, 1986; http://research.microsoft.com/en-us/people/philbe/chapter7.pdf.Google ScholarGoogle Scholar
  2. Corbett, J. C. et al. Spanner: Google's globally distributed database. In Proceedings of the 10th Usenix Symposium on Operating Systems Design and Implementation, 2012; https://www.usenix.org/conference/osdi12/technical-sessions/presentation/corbett. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Garduno, E., Kavulya, S. P., Tan, J., Gandhi, R., Narasimhan, P. Theia: Visual signatures for problem diagnosis in large Hadoop clusters. In Proceedings of the 26th International Conference on Large Installation System Administration, 2012, 33--42; https://users.ece.cmu.edu/~spertet/papers/hadoopvis-lisa12-cameraready-v3.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Geels, D., Altekar, G., Maniatis, P., Roscoe, T., Stoica, I. Friday: Global comprehension for distributed replay. In Proceedings of the 4th Usenix Conference on Networked Systems Design and Implementation, (2007); https://www.usenix.org/legacy/event/nsdi07/tech/full_papers/geels/geels.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Hawblitzel, C., Howell, J., Kapritsos, M., Lorch, J. R., Parno, B., Roberts, M. L., Setty, S., Zill, B. IronFleet: Proving practical distributed systems correct. In Proceedings of the 25th Symposium on Operating Systems Principles; 2015; http://sigops.org/sosp/sosp15/current/2015-Monterey/250-hawblitzel-online.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Isaacs, K.E. et al. Combing the communication hairball: Visualizing parallel execution traces using logical time. IEEE Transactions on Visualization and Computer Graphics 20, 12 (Dec 2014), 2349--2358.Google ScholarGoogle Scholar
  7. Killian, C., Anderson, J. W., Jhala, R., Vahdat, A. Life, death, and the critical transition: Finding liveness bugs in systems code. In Proceedings of the 4th Usenix Conference on Networked Systems Design and Implementation, (2007); https://www.usenix.org/legacy/event/nsdi07/tech/killian/killian.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Liu, X., Guo, Z., Wang, X., Chen, F., Lian, X., Tang, J., Wu, M., Kaashoek, M. F., Zhang, Z. D3S: Debugging deployed distributed systems. In Proceedings of the 5th Usenix Symposium on Networked Systems Design and Implementation, 2008; 423--437; http://static.usenix.org/event/nsdi08/tech/full_papers/liu_xuezheng/liu_xuezheng.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Mace, J., Roelke, R., Fonseca, R. Pivot tracing: Dynamic causal monitoring for distributed systems. In Proceedings of the 25th Symposium on Operating Systems Principles, (2015); 378--393; http://sigops.org/sosp/sosp15/current/2015-Monterey/122-mace-online.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Mattern, F. Virtual time and global states of distributed systems. In Proceedings of the International Workshop on Parallel and Distributed Algorithms, 1989; http://homes.cs.washington.edu/~arvind/cs425/doc/mattern89virtual.pdfGoogle ScholarGoogle Scholar
  11. Newcombe, C., Rath, T., Zhang, F., Munteanu, B., Brooker, M., Deardeuff, M. How Amazon Web Services uses formal methods. Commun. ACM 58, 4 (2015), 66--73; http://cacm.acm.org/magazines/2015/4/184701-how-amazon-web-services-uses-formal-methods/fulltext. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Project Voldemort; http://www.project-voldemort.com/voldemort/.Google ScholarGoogle Scholar
  13. Sambasivan, R.R., Fonseca, R., Shafer, I., Ganger, G. So, you want to trace your distributed system? Key design insights from years of practical experience. Parallel Data Laboratory, Carnegie Mellon University, 2014; http://www.pdl.cmu.edu/PDL-FTP/SelfStar/CMU-PDL-14-102.pdf.Google ScholarGoogle Scholar
  14. Scott, C. et al. Minimize faulty executions of distributed systems. In Proceedings of the 13th Usenix Symposium on Networked Design and Implementation (Santa Clara, CA, Mar. 16--18, 2016) 291--309. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Sigelman, B. H., Barroso, L. A., Burrows, M., Stephenson, P., Plakal, M., Beaver, D., Jaspan, S., Shanbhag, C. Dapper, a large-scale distributed systems tracing infrastructure. Research at Google, 2010; http://research.google.com/pubs/pub36356.html.Google ScholarGoogle Scholar
  16. Wilcox, J. R., Woos, D., Panchekha, P., Tatlock, Z., Wang, X., Ernst, M. D., Anderson, T. Verdi: A framework for implementing and formally verifying distributed systems. In Proceedings of the 36th SIGPLAN Conference on Programming Language Design and Implementation, 2015, 357--368; https://homes.cs.washington.edu/~ztatlock/pubs/verdi-wilcox-pldi15.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Xu, W., Huang, L., Fox, A., Patterson, D., Jordan, M. Experience mining Google's production console logs. In Proceedings of the Workshop on Managing Systems via Log Analysis and Machine Learning Techniques, 2010; http://iiis.tsinghua.edu.cn/~weixu/files/slaml10.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Yang, J., et al. MoDist: Transparent model checking of unmodified distributed systems. In Proceedings of the 6th Usenix Symposium on Networked Systems Design and Implementation, 2009, 213--228; https://www.usenix.org/legacy/event/nsdi09/tech/full_papers/yang/yang_html/. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Debugging distributed systems

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image Communications of the ACM
      Communications of the ACM  Volume 59, Issue 8
      August 2016
      94 pages
      ISSN:0001-0782
      EISSN:1557-7317
      DOI:10.1145/2975594
      • Editor:
      • Moshe Y. Vardi
      Issue’s Table of Contents

      Copyright © 2016 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 22 July 2016

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Popular
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format