ABSTRACT
Similar to software bugs, configuration errors are also one of the major causes of today's system failures. Many configuration issues manifest themselves in ways similar to software bugs such as crashes, hangs, silent failures. It leaves users clueless and forced to report to developers for technical support, wasting not only users' but also developers' precious time and effort. Unfortunately, unlike software bugs, many software developers take a much less active, responsible role in handling configuration errors because "they are users' faults."
This paper advocates the importance for software developers to take an active role in handling misconfigurations. It also makes a concrete first step towards this goal by providing tooling support to help developers improve their configuration design, and harden their systems against configuration errors. Specifically, we build a tool, called Spex, to automatically infer configuration requirements (referred to as constraints) from software source code, and then use the inferred constraints to: (1) expose misconfiguration vulnerabilities (i.e., bad system reactions to configuration errors such as crashes, hangs, silent failures); and (2) detect certain types of error-prone configuration design and handling.
We evaluate Spex with one commercial storage system and six open-source server applications. Spex automatically infers a total of 3800 constraints for more than 2500 configuration parameters. Based on these constraints, Spex further detects 743 various misconfiguration vulnerabilities and at least 112 error-prone constraints in the latest versions of the evaluated systems. To this day, 364 vulnerabilities and 80 inconsistent constraints have been confirmed or fixed by developers after we reported them. Our results have influenced the Squid Web proxy project to improve its configuration parsing library towards a more user-friendly design.
Supplemental Material
- B. Aggarwal, R. Bhagwan, T. Das, S. Eswaran, V. N. Padmanabhan, and G. M. Voelker. NetPrints: Diagnosing Home Network Misconfigurations Using Shared Knowledge. In Proceedings of the 6th USENIX Symposium on Networked System Design and Implementation (NSDI'09), April 2009. Google ScholarDigital Library
- Amazon Web Services Team. Summary of the Amazon EC2 and Amazon RDS Service Disruption in the US East Region. http://aws.amazon.com/message/65648, 2011.Google Scholar
- M. Attariyan, M. Chow, and J. Flinn. X-ray: Automating Root-Cause Diagnosis of Performance Anomalies in Production Software. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI'12), October 2012. Google ScholarDigital Library
- M. Attariyan and J. Flinn. Using Causality to Diagnose Configuration Bugs. In Proceedings of the 2008 USENIX Annual Technical Conference (USENIX'08), June 2008. Google ScholarDigital Library
- M. Attariyan and J. Flinn. Automating Configuration Troubleshooting with Dynamic Information Flow Analysis. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation (OSDI'10), October 2010. Google ScholarDigital Library
- L. A. Barroso and U. Hölzle. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines. Morgan and Claypool Publishers, 2009. Google ScholarDigital Library
- K. Chen, C. Guo, H. Wu, J. Yuan, Z. Feng, Y. Chen, S. Lu, and W. Wu. Generic and Automatic Address Configuration for Data Center Networks. In Proceedings of the 2010 Annual Conference of the ACM Special Interest Group on Data Communication (SIGCOMM'10), August 2010. Google ScholarDigital Library
- Computing Research Association. Grand Research Challenges in Information Systems, Technical Report, September 2003.Google Scholar
- S. Duan, V. Thummala, and S. Babu. Tuning Database Conguration Parameters with iTuned. In Proceedings of the 35th International Conference on Very Large Data Bases (VLDB'09), August 2009. Google ScholarDigital Library
- D. Engler, D. Y. Chen, S. Hallem, A. Chou, and B. Chelf. Bugs as Deviant Behavior: A General Approach to Inferring Errors in Systems Code. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP'01), October 2001. Google ScholarDigital Library
- N. Feamster and H. Balakrishnan. Detecting BGP Configuration Faults with Static Analysis. In Proceedings of the 2nd USENIX Symposium on Networked System Design and Implementation (NSDI'05), May 2005. Google ScholarDigital Library
- J. Gray. Why Do Computers Stop and What Can Be Done About It? Tandem Technical Report 85.7, June 1985.Google Scholar
- R. Johnson. More Details on Today's Outage. http://www.facebook.com/note.php?note_id=431441338919,2010.Google Scholar
- A. Kappor. Web-to-host: Reducing Total Cost of Ownership. Technical Report 200503, The Tolly Group, May 2000.Google Scholar
- L. Keller, P. Upadhyaya, and G. Candea. ConfErr: A Tool for Assessing Resilience to Human Configuration Errors. In Proceedings of the 38th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'08), June 2008.Google ScholarCross Ref
- S. Kendrick. What Takes Us Down? USENIX;login:, 37(5):37--45, October 2012.Google Scholar
- N. Kushman and D. Katabi. Enabling Configuration-Independent Automation by Non-Expert Users. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation (OSDI'10), October 2010. Google ScholarDigital Library
- C. Lattner and V. Adve. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In Proceedings of the 2004 International Symposium on Code Generation and Optimization (CGO'04), March 2004. Google ScholarDigital Library
- R. Mahajan, D. Wetherall, and T. Anderson. Understanding BGP Misconfigurations. In Proceedings of the 2002 Annual Conference of the ACM Special Interest Group on Data Communication (SIGCOMM'02), August 2002. Google ScholarDigital Library
- D. J. Mayhew. Principles and Guidelines in Software User Interface Design. Prentice Hall, October 1991. Google ScholarDigital Library
- J. Mickens, M. Szummer, and D. Narayanan. Snitch: Interactive Decision Trees for Troubleshooting Misconfigurations. In Proceedings of the 2nd USENIX Workshop on Tackling Computer Systems Problems with Machine Learning Techniques (SYSML'07), 2007. Google ScholarDigital Library
- K. Nagaraja, F. Oliveira, R. Bianchini, R. P. Martin, and T. D. Nguyen. Understanding and Dealing with Operator Mistakes in Internet Services. In Proceedings of the 6th USENIX Conference on Operating Systems Design and Implementation (OSDI'04), December 2004. Google ScholarDigital Library
- D. A. Norman. Design Rules Based on Analyses of Human Error. Communications of the ACM, 26(4):254--258, April 1983. Google ScholarDigital Library
- D. Oppenheimer, A. Ganapathi, and D. A. Patterson. Why Do Internet Services Fail, and What Can Be Done About It? In Proceedings of the 4th USENIX Symposium on Internet Technologies and Systems (USITS'03), March 2003. Google ScholarDigital Library
- A. Rabkin and R. Katz. Precomputing Possible Configuration Error Diagnosis. In Proceedings of the 26th IEEE/ACM International Conference on Automated Software Engineering (ASE'11), November 2011. Google ScholarDigital Library
- A. Rabkin and R. Katz. Static Extraction of Program Configuration Options. In Proceedings of the 33th International Conference on Software Engineering (ICSE'11), May 2011. Google ScholarDigital Library
- A. Rabkin and R. Katz. How Hadoop Clusters Break. IEEE Software, 30(4):88--94, July 2013.Google ScholarDigital Library
- A. Schüpbach, A. Baumann, T. Roscoe, and S. Peter. A Declarative Language Approach to Device Configuration. In Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'11), March 2011. Google ScholarDigital Library
- M. Sridharan, S. J. Fink, and R. Bodík. Thin Slicing. In Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation (PLDI'07), June 2007. Google ScholarDigital Library
- Y.-Y. Su, M. Attariyan, and J. Flinn. AutoBash: Improving Configuration Management with Operating System Causality Analysis. In Proceedings of the 21st ACM Symposium on Operating Systems Principles (SOSP'07), October 2007. Google ScholarDigital Library
- Y. Sverdlik. Microsoft: Misconfigured Network Device Led to Azure Outage. http://www.datacenterdynamics.com/focus/archive/2012/07/microsoft-misconfigured-network-device-led-azure-outage, 2012.Google Scholar
- H. J. Wang, J. C. Platt, Y. Chen, R. Zhang, and Y.-M. Wang. Automatic Misconfiguration Troubleshooting with PeerPressure. In Proceedings of the 6th USENIX Conference on Operating Systems Design and Implementation (OSDI'04), December 2004. Google ScholarDigital Library
- Y.-M. Wang, C. Verbowski, J. Dunagan, Y. Chen, H. J. Wang, C. Yuan, and Z. Zhang. STRIDER: A Black-box, State-based Approach to Change and Configuration Management and Support. In Proceedings of the 17th Large Installation Systems Administration Conference (LISA'03), October 2003. Google ScholarDigital Library
- A. Whitaker, R. S. Cox, and S. D. Gribble. Configuration Debugging as Search: Finding the Needle in the Haystack. In Proceedings of the 6th USENIX Conference on Operating Systems Design and Implementation (OSDI'04), December 2004. Google ScholarDigital Library
- Y. Xiong, A. Hubaux, S. She, and K. Czarnecki. Generating Range Fixes for Software Configuration. In Proceedings of the 34th International Conference on Software Engineering (ICSE'12), June 2012. Google ScholarDigital Library
- Z. Yin, X. Ma, J. Zheng, Y. Zhou, L. N. Bairavasundaram, and S. Pasupathy. An Empirical Study on Configuration Errors in Commercial and Open Source Systems. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP'11), October 2011. Google ScholarDigital Library
- C. Yuan, N. Lao, J.-R. Wen, J. Li, Z. Zhang, Y.-M. Wang, and W.-Y. Ma. Automated Known Problem Diagnosis with Event Traces. In Proceedings of the 1st EuroSys Conference (EuroSys'06), April 2006. Google ScholarDigital Library
- D. Yuan, Y. Xie, R. Panigrahy, J. Yang, C. Verbowski, and A. Kumar. Context-based Online Configuration Error Detection. In Proceedings of the 2011 USENIX Annual Technical Conference (USENIX'11), June 2011. Google ScholarDigital Library
- A. Zeller. Why Programs Fail: A Guide to Systematic Debugging (2nd Edition). Morgan Kaufmann Publishers, June 2009. Google ScholarDigital Library
- S. Zhang and M. D. Ernst. Automated Diagnosis of Software Configuration Errors. In Proceedings of the 35th Internationl Conference on Software Engineering (ICSE'13), May 2013. Google ScholarDigital Library
Index Terms
- Do not blame users for misconfigurations
Recommendations
EnCore: exploiting system environment and correlation information for misconfiguration detection
ASPLOS '14: Proceedings of the 19th international conference on Architectural support for programming languages and operating systemsAs software systems become more complex and configurable, failures due to misconfigurations are becoming a critical problem. Such failures often have serious functionality, security and financial consequences. Further, diagnosis and remediation for such ...
Systems Approaches to Tackling Configuration Errors: A Survey
In recent years, configuration errors (i.e., misconfigurations) have become one of the dominant causes of system failures, resulting in many severe service outages and downtime. Unfortunately, it is notoriously difficult for system users (e.g., ...
Easier Said Than Done: Diagnosing Misconfiguration via Configuration Constraints Analysis: A Study of the Variance of Configuration Constraints in Source Code
EASE '17: Proceedings of the 21st International Conference on Evaluation and Assessment in Software EngineeringMisconfigurations have drawn tremendous attention for their increasing prevalence and severity, and the main causes are the complexity of configurations as well as the lack of domain knowledge for software. To diagnose misconfigurations, one typical ...
Comments