ABSTRACT
We analyze the version history of 7 software systems to predict the most fault prone entities and files. The basic assumption is that faults do not occur in isolation, but rather in bursts of several related faults. Therefore, we cache locations that are likely to have faults: starting from the location of a known (fixed) fault, we cache the location itself, any locations changed together with the fault, recently added locations, and recently changed locations. By consulting the cache at the moment a fault is fixed, a developer can detect likely fault-prone locations. This is useful for prioritizing verification and validation resources on the most fault prone files or entities. In our evaluation of seven open source projects with more than 200,000 revisions, the cache selects 10% of the source code files; these files account for 73%-95% of faults-- a significant advance beyond the state of the art.
- {1} E. Alpaydin, Introduction to Machine Learning: The MIT Press, 2004. Google ScholarDigital Library
- {2} B. Behlendorf, C. M. Pilato, G. Stein, K. Fogel, K. Hancock, and B. Collins-Sussman, "Subversion Project Homepage," 2005.Google Scholar
- {3} J. Bevan and E. J. Whitehead, Jr., "Identification of Software Instabilities," Proc. of 2003 Working Conference on Reverse Engineering (WCRE 2003), Victoria, Canada, 2003. Google ScholarDigital Library
- {4} J. Bevan, E. J. Whitehead, Jr., S. Kim, and M. Godfrey, "Facilitating Software Evolution with Kenyon," Proc. of the 2005 European Software Engineering Conference and 2005 Foundations of Software Engineering (ESEC/FSE 2005), Lisbon, Portugal, 2005. Google ScholarDigital Library
- {5} D. Cubranic and G. C. Murphy, "Hipikat: Recommending pertinent software development artifacts," Proc. of 25th International Conference on Software Engineering (ICSE), Portland, Oregon, 2003, pp. 408-418. Google ScholarDigital Library
- {6} V. Dallmeier, P. Weißgerber, and T. Zimmermann, "APFEL: A Preprocessing Framework For Eclipse," http://www.st.cs.unisb.de/softevo/apfel/, 2005.Google Scholar
- {7} M. Fischer, M. Pinzger, and H. Gall, "Populating a Release History Database from Version Control and Bug Tracking Systems," Proc. of 2003 Int'l Conference on Software Maintenance (ICSM'03), 2003, pp. 23-32. Google ScholarDigital Library
- {8} H. Gall, M. Jazayeri, and J. Krajewski, "CVS Release History Data for Detecting Logical Couplings," Proc. of Sixth International Workshop on Principles of Software Evolution (IWPSE'03), Helsinki, Finland, 2003, pp. 13-23. Google ScholarDigital Library
- {9} M. W. Godfrey and L. Zou, "Using Origin Analysis to Detect Merging and Splitting of Source Code Entities," IEEE Trans. on Software Engineering, vol. 31, pp. 166-181, 2005. Google ScholarDigital Library
- {10} T. L. Graves, A. F. Karr, J. S. Marron, and H. Siy, "Predicting Fault Incidence Using Software Change History," IEEE Transactions on Software Engineering, vol. 26, pp. 653-661, 2000. Google ScholarDigital Library
- {11} A. E. Hassan and R. C. Holt, "The Top Ten List: Dynamic Fault Prediction," Proc. of International Conference on Software Maintenance (ICSM 2005), Budapest, Hungary, 2005, pp. 263-272. Google ScholarDigital Library
- {12} T. M. Khoshgoftaar and E. B. Allen, "Ordering Fault-Prone Software Modules," Software Quality Journal, vol. 11, pp. 19- 37, 2003. Google ScholarDigital Library
- {13} T. M. Khoshgoftaar and E. B. Allen, "Predicting the Order of Fault-Prone Modules in Legacy Software," Proc. of The Ninth International Symposium on Software Reliability Engineering, Paderborn, Germany, 1998, pp. 344-353. Google ScholarDigital Library
- {14} S. Kim, K. Pan, and E. J. Whitehead, Jr., "When Functions Change Their Names: Automatic Detection of Origin Relationships," Proc. of 12th Working Conference on Reverse Engineering (WCRE 2005), Pittsburgh, PA, USA, 2005, pp. 143-152. Google ScholarDigital Library
- {15} S. Kim, T. Zimmermann, K. Pan, and E. J. Whitehead, Jr., "Automatic Identification of Bug Introducing Changes," Proc. of International Conference on Automated Software Engineering (ASE 2006), Tokyo, Japan, 2006. Google ScholarDigital Library
- {16} A. J. Ko and B. A. Myers, "A Framework and Methodology for Studying the Causes of Software Errors in Programming Systems," Journal of Visual Languages and Computing, vol. 16, pp. 41-84, 2005. Google ScholarDigital Library
- {17} A. Mockus and L. G. Votta, "Identifying Reasons for Software Changes Using Historic Databases," Proc. of International Conference on Software Maintenance (ICSM 2000), San Jose, California, USA, 2000, pp. 120-130. Google ScholarDigital Library
- {18} A. Mockus and D. M. Weiss, "Predicting Risk of Software Changes," Bell Labs Technical Journal, vol. 5, pp. 169-180, 2002.Google ScholarCross Ref
- {19} N. Nagappan and T. Ball, "Use of Relative Code Churn Measures to Predict System Defect Density," Proc. of 2005 Int'l Conference on Software Engineering (ICSE 2005), Saint Louis, Missouri, USA, 2005, pp. 284-292. Google ScholarDigital Library
- {20} N. Nagappan, T. Ball, and A. Zeller, "Mining Metrics to Predict Component Failures," Proc. of 2006 Int'l Conference on Software Engineering (ICSE 2006), Shanghai, China, 2006, pp. 452-461. Google ScholarDigital Library
- {21} T. J. Ostrand, E. J. Weyuker, and R. M. Bell, "Predicting the Location and Number of Faults in Large Software Systems," IEEE Transactions on Software Engineering, vol. 31, pp. 340- 355, 2005. Google ScholarDigital Library
- {22} J. ¿liwerski, T. Zimmermann, and A. Zeller, "When Do Changes Induce Fixes?," Proc. of Int'l Workshop on Mining Software Repositories (MSR 2005), Saint Louis, Missouri, USA, 2005. Google ScholarDigital Library
- {23} J. ¿liwerski, T. Zimmermann, and A. Zeller, "HATARI: Raising Risk Awareness. Research Demonstration," Proc. of the 2005 European Software Engineering Conference and 2005 Foundations of Software Engineering (ESEC/FSE 2005), Lisbon, Portugal, 2005, pp. 107-110. Google ScholarDigital Library
- {24} P. Weißgerber and S. Diehl, "Identifying Refactorings from Source-Code Changes," Proc. of International Conference on Automated Software Engineering (ASE 2006), Tokyo, Japan, 2006, pp. 231-240. Google ScholarDigital Library
- {25} T. Zimmermann and P. Weißgerber, "Preprocessing CVS Data for Fine-Grained Analysis," Proc. of Proc. Intl. Workshop on Mining Software Repositories (MSR), Edinburgh, Scotland, 2004.Google Scholar
- {26} T. Zimmermann, P. Weißgerber, S. Diehl, and A. Zeller, "Mining Version Histories to Guide Software Changes," IEEE Trans. Software Eng., vol. 31, pp. 429-445, 2005. Google ScholarDigital Library
Index Terms
- Predicting Faults from Cached History
Recommendations
Predicting faults from cached history
ISEC '08: Proceedings of the 1st India software engineering conferenceWe analyze the version history of 7 software systems to predict the most fault prone entities and files. The basic assumption is that faults do not occur in isolation, but rather in bursts of several related faults. Therefore, we cache locations that ...
On undetectable faults and fault diagnosis
The presence of an undetectable fault ui may modify the response of a detectable fault dj to a test set used for fault diagnosis. This may impact the accuracy of fault diagnosis based on the responses of single faults. Many state-of-the-art diagnosis ...
Using Dummy Bridging Faults to Define Reduced Sets of Target Faults
To address the large numbers of bridging faults in a circuit, several approaches have been proposed for the selection of subsets of bridging faults as targets for test generation. A different approach that can be viewed as a fault collapsing method ...
Comments