ABSTRACT
When developers change a program, regression tests can fail not only due to faults in the program but also due to out-of-date test code that does not reflect the desired behavior of the program. When this occurs, it is necessary to repair test code such that the tests pass. Repairing tests manually is difficult and time consuming. We recently developed ReAssert, a tool that can automatically repair broken unit tests, but only if they lack complex control flow or operations on expected values.
This paper introduces symbolic test repair, a technique based on symbolic execution, which can overcome some of ReAssert's limitations. We reproduce experiments from earlier work and find that symbolic test repair improves upon previously reported results both quantitatively and qualitatively. We also perform new experiments which confirm the benefits of symbolic test repair and also show surprising similarities in test failures for open-source Java and .NET programs. Our experiments use Pex, a powerful symbolic execution engine for .NET, and we find that Pex provides over half of the repairs possible from the theoretically ideal symbolic test repair.
- S. Anand, C. Pǎsǎreanu, and W. Visser. JPF-SE: A symbolic execution extension to Java PathFinder. In TACAS, 2007. Google ScholarDigital Library
- A. Arcuri and X. Yao. A novel co-evolutionary approach to automatic software bug fixing. In CEC, 2008.Google ScholarCross Ref
- K. Beck. Where, oh where to test http://www.threeriversinstitute.org/WhereToTest.html.Google Scholar
- M. Boshernitsan, R. Doong, and A. Savoia. From Daikon to Agitator: Lessons and challenges in building a commercial tool for developer testing. In ISSTA, 2006. Google ScholarDigital Library
- D. Brumley, J. Caballero, Z. Liang, J. Newsome, and D. Song. Towards automatic discovery of deviations in binary implementations with applications to error detection and fingerprint generation. In USENIX, 2007. Google ScholarDigital Library
- C2 Wiki. Deleting broken unit tests. http://c2.com/cgi-bin/wiki?DeletingBrokenUnitTests.Google Scholar
- C. Cadar, D. Dunbar, and D. R. Engler. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In OSDI, 2008. Google ScholarDigital Library
- C. Cadar, V. Ganesh, P. M. Pawlowski, D. L. Dill, and D. R. Engler. EXE: Automatically generating inputs of death. ACM Trans. Inf. Syst. Secur., 12(2), 2008. Google ScholarDigital Library
- H. Chang, L. Mariani, and M. Pezzè. In-field healing of integration problems with COTS components. In ICSE, 2009.Google Scholar
- L. Clarke and D. Richardson. Symbolic evaluation methods for program analysis. In Program Flow Analysis: Theory and Applications, chapter 9. 1981.Google Scholar
- H. Cleve and A. Zeller. Locating causes of program failures. In ICSE, 2005. Google ScholarDigital Library
- C. Csallner, N. Tillmann, and Y. Smaragdakis. DySy: dynamic symbolic execution for invariant inference. In ICSE, 2008. Google ScholarDigital Library
- B. Daniel and M. Boshernitsan. Predicting effectiveness of automatic testing tools. In ASE, 2008. Google ScholarDigital Library
- B. Daniel, V. Jagannath, D. Dig, and D. Marinov. ReAssert: Suggesting repairs for broken unit tests. In ASE, 2009. http://mir.cs.illinois.edu/reassert/. Google ScholarDigital Library
- B. Daniel, V. Jagannath, D. Dig, and D. Marinov. ReAssert: Suggesting repairs for broken unit tests. Technical Report http://hdl.handle.net/2142/13628, University. of Illinois at Urbana-Champaign, 2009.Google ScholarDigital Library
- db4objects. Sharpen. https://developer.db4o.com/Documentation/Reference/db4o-7.12/java/reference/html/reference/sharpen.html.Google Scholar
- L. de Moura and N. Bjørner. Z3: An efficient SMT solver. In TACAS, 2008. http://research.microsoft.com/projects/z3/. Google ScholarDigital Library
- P. Godefroid. Compositional dynamic test generation. In POPL, 2007. Google ScholarDigital Library
- P. Godefroid, A. Kiezun, and M. Y. Levin. Grammar-based whitebox fuzzing. In PLDI, 2008. Google ScholarDigital Library
- P. Godefroid, N. Klarlund, and K. Sen. DART: directed automated random testing. In PLDI, 2005. Google ScholarDigital Library
- M. Grechanik, Q. Xie, and C. Fu. Maintaining and evolving GUI-directed test scripts. In ICSE, 2009. Google ScholarDigital Library
- M. Harman and N. Alshahwan. Automated session data repair for web application regression testing. In ICST, 2008. Google ScholarDigital Library
- M. J. Harrold, R. Gupta, and M. L. Soffa. A methodology for controlling the size of a test suite. TOSEM, 2(3), 1993. Google ScholarDigital Library
- H. He and N. Gupta. Automated debugging using path-based weakest preconditions. In FASE, 2004.Google ScholarCross Ref
- J. H. Hicinbothom and W. W. Zachary. A tool for automatically generating transcripts of human-computer interaction. In HFES, 1993.Google ScholarCross Ref
- D. Jeffrey, M. Feng, N. Gupta, and R. Gupta. BugFix: A learning-based tool to assist developers in fixing bugs. In ICPC, 2009.Google ScholarCross Ref
- D. Jeffrey, N. Gupta, and R. Gupta. Fault localization using value replacement. In ISSTA, 2008. Google ScholarDigital Library
- L. Jiang and Z. Su. Context-aware statistical debugging: From bug predictors to faulty control flow paths. In ASE, 2007. Google ScholarDigital Library
- Y. Kannan and K. Sen. Universal symbolic execution and its application to likely data structure invariant generation. In ISSTA, 2008. Google ScholarDigital Library
- A. Kieyzun, P. J. Guo, K. Jayaraman, and M. D. Ernst. Automatic creation of SQL injection and cross-site scripting attacks. In ICSE, 2009. Google ScholarDigital Library
- J. C. King. Symbolic execution and program testing. CACM, 19(7):385--394, 1976. Google ScholarDigital Library
- R. Kitts. Is bad software really my fault? http://artima.com/weblogs/viewpost.jsp?thread=231225.Google Scholar
- R. Majumdar and K. Sen. Hybrid concolic testing. In ICSE, 2007. Google ScholarDigital Library
- A. Memon. Automatically repairing event sequence-based GUI test suites for regression testing. TSE, 18(2), 2008. Google ScholarDigital Library
- L. Moonen, A. van Deursen, A. Zaidman, and M. Bruntink. On the interplay between software testing and evolution and its effect on program comprehension. In Software Evolution. 2008.Google Scholar
- NoMoreHacks. Shocker: Changing my code broke my tests, a developer confesses. http://nomorehacks.wordpress.com/2009/08/18/shocker-changing-my-code-broke-my-tests-a-developer-confesses/.Google Scholar
- C. S. Pǎsǎreanu, P. C. Mehlitz, D. H. Bushnell, K. Gundy-Burlet, M. Lowry, S. Person, and M. Pape. Combining unit-level symbolic execution and system-level concrete execution for testing NASA software. In ISSTA, 2008.Google ScholarDigital Library
- K. Rutherford. Why I broke 89 tests. http://silkandspinach.net/2009/10/18/why-i-broke-89-tests/.Google Scholar
- M. Sama, F. Raimondi, D. S. Rosenblum, and W. Emmerich. Algorithms for efficient symbolic detection of faults in context-aware applications. In ASE Workshops, 2008.Google ScholarDigital Library
- D. Schuler, V. Dallmeier, and A. Zeller. Efficient mutation testing by checking invariant violations. Technical report, Universitaet des Saarlandes, 2009.Google Scholar
- M. Schwern. On fixing a broken test suite, step one: Break the cycle of failure. http://use.perl.org/~schwern/journal/32782.Google Scholar
- K. Sen, D. Marinov, and G. Agha. CUTE: A concolic unit testing engine for C. In ESEC/FSE, 2005. Google ScholarDigital Library
- S. Sinha, H. Shah, C. Görg, S. Jiang, M. Kim, and M. J. Harrold. Fault localization and repair for Java runtime exceptions. In ISSTA, 2009. Google ScholarDigital Library
- Stack Overflow. Program evolution and broken tests. http://stackoverflow.com/questions/2054171/.Google Scholar
- S. Tallam, C. Tian, R. Gupta, and X. Zhang. Avoiding program failures through safe execution perturbationss. In COMPSAC, 2008. Google ScholarDigital Library
- N. Tillmann and J. de Halleux. Pex-white box test generation for .NET. In Tests and Proofs. 2008. http://research.microsoft.com/projects/Pex/. Google ScholarDigital Library
- N. Tillmann and W. Schulte. Parameterized unit tests. In ESEC/FSE, 2005. Google ScholarDigital Library
- F. Tip. A survey of program slicing techniques. Journal of Programming Languages, 3(3), 1995.Google Scholar
- A. Tomb, G. Brat, and W. Visser. Variably interprocedural program analysis for runtime error detection. In ISSTA, 2007. Google ScholarDigital Library
- X. Wang, S.-C. Cheung, W. K. Chan, and Z. Zhang. Taming coincidental correctness: Coverage refinement with context patterns to improve fault localization. In ICSE, 2009. Google ScholarDigital Library
- W. Weimer, T. V. Nguyen, C. L. Goues, and S. Forrest. Automatically finding patches using genetic programming. In ICSE, 2009. Google ScholarDigital Library
- J. Wloka, B. G. Ryder, and F. Tip. JUnitMX - A change-aware unit testing tool. In ICSE, 2009. Google ScholarDigital Library
- R.-G. Xu, P. Godefroid, and R. Majumdar. Testing for buffer overflows with length abstraction. In ISSTA, 2008. Google ScholarDigital Library
- F. Yu, T. Bultan, and O. H. Ibarra. Symbolic string verification: Combining string analysis and size analysis. In TACAS, 2009. Google ScholarDigital Library
- Y. Yu, J. A. Jones, and M. J. Harrold. An empirical study of the effects of test-suite reduction on fault localization. In ICSE, 2008. Google ScholarDigital Library
- A. Zeller. Automated debugging: Are we close? Computer, 34(11), 2001. Google ScholarDigital Library
- L. Zhang, S.-S. Hou, C. Guo, T. Xie, and H. Mei. Time-aware test-case prioritization using integer linear programming. In ISSTA, 2009. Google ScholarDigital Library
- X. Zheng and M.-H. Chen. Maintaining multi-tier web applications. In ICSM, 2007.Google ScholarCross Ref
Index Terms
- On test repair using symbolic execution
Recommendations
ReAssert: a tool for repairing broken unit tests
ICSE '11: Proceedings of the 33rd International Conference on Software EngineeringSuccessful software systems continuously change their requirements and thus code. When this happens, some existing tests get broken because they no longer reflect the intended behavior, and thus they need to be updated. Repairing broken tests can be ...
Visual web test repair
ESEC/FSE 2018: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software EngineeringWeb tests are prone to break frequently as the application under test evolves, causing much maintenance effort in practice. To detect the root causes of a test breakage, developers typically inspect the test's interactions with the application through ...
Using test case reduction and prioritization to improve symbolic execution
ISSTA 2014: Proceedings of the 2014 International Symposium on Software Testing and AnalysisScaling symbolic execution to large programs or programs with complex inputs remains difficult due to path explosion and complex constraints, as well as external method calls. Additionally, creating an effective test structure with symbolic inputs can ...
Comments