ABSTRACT
Automatically identifying commits that induce fixes is an important task, as it enables researchers to quickly and efficiently validate many types of software engineering analyses, such as software metrics or models for predicting faulty components. Previous work on SZZ, an algorithm designed by Sliwerski et al and improved upon by Kim et al, provides a process for automatically identifying the fix-inducing predecessor lines to lines that are changed in a bug-fixing commit. However, as of yet no one has verified that the fix-inducing lines identified by SZZ are in fact responsible for introducing the fixed bug. Also, the SZZ algorithm relies on annotation graphs, which are imprecise in the face of large blocks of modified code, for back-tracking through previous revisions to the fix-inducing change.
In this work we outline several improvements to the SZZ algorithm: First, we replace annotation graphs with line-number maps that track unique source lines as they change over the lifetime of the software; and second, we use DiffJ, a Java syntax-aware diff tool, to ignore comments and formatting changes in the source. Finally, we begin verifying how often a fix-inducing change identified by SZZ is the true source of a bug.
- G. Canfora, L. Cerulo, and M. D. Penta. Identifying changed source code lines from version repositories. In MSR '07: Proceedings of the Fourth International Workshop on Mining Software Repositories, page 14, Washington, DC, USA, 2007. IEEE Computer Society. Google ScholarDigital Library
- S. Kim, T. Zimmermann, K. Pan, and E. J. J. Whitehead. Automatic identification of bug-introducing changes. In ASE '06: Proceedings of the 21st IEEE/ACM International Conference on Automated Software Engineering, pages 81--90, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarDigital Library
- J. Pace. A tool which compares java files based on content. http://www.incava.org/projects/java/diffj, 2007.Google Scholar
- J. Śliwerski, T. Zimmermann, and A. Zeller. When do changes induce fixes? SIGSOFT Softw. Eng. Notes, 30(4):1--5, 2005. Google ScholarDigital Library
- C. Williams and J. Spacco. Branching and merging in the repository. In MSR '08: Proceedings of the Fifth International Workshop on Mining Software Repositories, Leipzig, Germany, 2008. Google ScholarDigital Library
- T. Zimmermann, S. Kim, A. Zeller, and J. E. James Whitehead. Mining version archives for co-changed lines. In MSR '06: Proceedings of the 2006 international workshop on Mining software repositories, pages 72--75, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
Index Terms
- SZZ revisited: verifying when changes induce fixes
Recommendations
SZZ unleashed: an open implementation of the SZZ algorithm - featuring example usage in a study of just-in-time bug prediction for the Jenkins project
MaLTeSQuE 2019: Proceedings of the 3rd ACM SIGSOFT International Workshop on Machine Learning Techniques for Software Quality EvaluationMachine learning applications in software engineering often rely on detailed information about bugs. While issue trackers often contain information about when bugs were fixed, details about when they were introduced to the system are often absent. As a ...
An empirical study on the use of SZZ for identifying inducing changes of non-functional bugs
AbstractNon-functional bugs, e.g., performance bugs and security bugs, bear a heavy cost on both software developers and end-users. For example, IBM estimates the cost of a single data breach to be millions of dollars. Tools to reduce the occurrence, ...
V-SZZ: automatic identification of version ranges affected by CVE vulnerabilities
ICSE '22: Proceedings of the 44th International Conference on Software EngineeringVulnerabilities publicly disclosed in the National Vulnerability Database (NVD) are assigned with CVE (Common Vulnerabilities and Exposures) IDs and associated with specific software versions. Many organizations, including IT companies and government, ...
Comments