skip to main content
10.1109/ICSE.2007.66acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
Article

Predicting Faults from Cached History

Published:24 May 2007Publication History

ABSTRACT

We analyze the version history of 7 software systems to predict the most fault prone entities and files. The basic assumption is that faults do not occur in isolation, but rather in bursts of several related faults. Therefore, we cache locations that are likely to have faults: starting from the location of a known (fixed) fault, we cache the location itself, any locations changed together with the fault, recently added locations, and recently changed locations. By consulting the cache at the moment a fault is fixed, a developer can detect likely fault-prone locations. This is useful for prioritizing verification and validation resources on the most fault prone files or entities. In our evaluation of seven open source projects with more than 200,000 revisions, the cache selects 10% of the source code files; these files account for 73%-95% of faults-- a significant advance beyond the state of the art.

References

  1. {1} E. Alpaydin, Introduction to Machine Learning: The MIT Press, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. {2} B. Behlendorf, C. M. Pilato, G. Stein, K. Fogel, K. Hancock, and B. Collins-Sussman, "Subversion Project Homepage," 2005.Google ScholarGoogle Scholar
  3. {3} J. Bevan and E. J. Whitehead, Jr., "Identification of Software Instabilities," Proc. of 2003 Working Conference on Reverse Engineering (WCRE 2003), Victoria, Canada, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. {4} J. Bevan, E. J. Whitehead, Jr., S. Kim, and M. Godfrey, "Facilitating Software Evolution with Kenyon," Proc. of the 2005 European Software Engineering Conference and 2005 Foundations of Software Engineering (ESEC/FSE 2005), Lisbon, Portugal, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. {5} D. Cubranic and G. C. Murphy, "Hipikat: Recommending pertinent software development artifacts," Proc. of 25th International Conference on Software Engineering (ICSE), Portland, Oregon, 2003, pp. 408-418. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. {6} V. Dallmeier, P. Weißgerber, and T. Zimmermann, "APFEL: A Preprocessing Framework For Eclipse," http://www.st.cs.unisb.de/softevo/apfel/, 2005.Google ScholarGoogle Scholar
  7. {7} M. Fischer, M. Pinzger, and H. Gall, "Populating a Release History Database from Version Control and Bug Tracking Systems," Proc. of 2003 Int'l Conference on Software Maintenance (ICSM'03), 2003, pp. 23-32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. {8} H. Gall, M. Jazayeri, and J. Krajewski, "CVS Release History Data for Detecting Logical Couplings," Proc. of Sixth International Workshop on Principles of Software Evolution (IWPSE'03), Helsinki, Finland, 2003, pp. 13-23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. {9} M. W. Godfrey and L. Zou, "Using Origin Analysis to Detect Merging and Splitting of Source Code Entities," IEEE Trans. on Software Engineering, vol. 31, pp. 166-181, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. {10} T. L. Graves, A. F. Karr, J. S. Marron, and H. Siy, "Predicting Fault Incidence Using Software Change History," IEEE Transactions on Software Engineering, vol. 26, pp. 653-661, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. {11} A. E. Hassan and R. C. Holt, "The Top Ten List: Dynamic Fault Prediction," Proc. of International Conference on Software Maintenance (ICSM 2005), Budapest, Hungary, 2005, pp. 263-272. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. {12} T. M. Khoshgoftaar and E. B. Allen, "Ordering Fault-Prone Software Modules," Software Quality Journal, vol. 11, pp. 19- 37, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. {13} T. M. Khoshgoftaar and E. B. Allen, "Predicting the Order of Fault-Prone Modules in Legacy Software," Proc. of The Ninth International Symposium on Software Reliability Engineering, Paderborn, Germany, 1998, pp. 344-353. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. {14} S. Kim, K. Pan, and E. J. Whitehead, Jr., "When Functions Change Their Names: Automatic Detection of Origin Relationships," Proc. of 12th Working Conference on Reverse Engineering (WCRE 2005), Pittsburgh, PA, USA, 2005, pp. 143-152. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. {15} S. Kim, T. Zimmermann, K. Pan, and E. J. Whitehead, Jr., "Automatic Identification of Bug Introducing Changes," Proc. of International Conference on Automated Software Engineering (ASE 2006), Tokyo, Japan, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. {16} A. J. Ko and B. A. Myers, "A Framework and Methodology for Studying the Causes of Software Errors in Programming Systems," Journal of Visual Languages and Computing, vol. 16, pp. 41-84, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. {17} A. Mockus and L. G. Votta, "Identifying Reasons for Software Changes Using Historic Databases," Proc. of International Conference on Software Maintenance (ICSM 2000), San Jose, California, USA, 2000, pp. 120-130. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. {18} A. Mockus and D. M. Weiss, "Predicting Risk of Software Changes," Bell Labs Technical Journal, vol. 5, pp. 169-180, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  19. {19} N. Nagappan and T. Ball, "Use of Relative Code Churn Measures to Predict System Defect Density," Proc. of 2005 Int'l Conference on Software Engineering (ICSE 2005), Saint Louis, Missouri, USA, 2005, pp. 284-292. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. {20} N. Nagappan, T. Ball, and A. Zeller, "Mining Metrics to Predict Component Failures," Proc. of 2006 Int'l Conference on Software Engineering (ICSE 2006), Shanghai, China, 2006, pp. 452-461. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. {21} T. J. Ostrand, E. J. Weyuker, and R. M. Bell, "Predicting the Location and Number of Faults in Large Software Systems," IEEE Transactions on Software Engineering, vol. 31, pp. 340- 355, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. {22} J. ¿liwerski, T. Zimmermann, and A. Zeller, "When Do Changes Induce Fixes?," Proc. of Int'l Workshop on Mining Software Repositories (MSR 2005), Saint Louis, Missouri, USA, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. {23} J. ¿liwerski, T. Zimmermann, and A. Zeller, "HATARI: Raising Risk Awareness. Research Demonstration," Proc. of the 2005 European Software Engineering Conference and 2005 Foundations of Software Engineering (ESEC/FSE 2005), Lisbon, Portugal, 2005, pp. 107-110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. {24} P. Weißgerber and S. Diehl, "Identifying Refactorings from Source-Code Changes," Proc. of International Conference on Automated Software Engineering (ASE 2006), Tokyo, Japan, 2006, pp. 231-240. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. {25} T. Zimmermann and P. Weißgerber, "Preprocessing CVS Data for Fine-Grained Analysis," Proc. of Proc. Intl. Workshop on Mining Software Repositories (MSR), Edinburgh, Scotland, 2004.Google ScholarGoogle Scholar
  26. {26} T. Zimmermann, P. Weißgerber, S. Diehl, and A. Zeller, "Mining Version Histories to Guide Software Changes," IEEE Trans. Software Eng., vol. 31, pp. 429-445, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Predicting Faults from Cached History

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in
                • Published in

                  cover image ACM Conferences
                  ICSE '07: Proceedings of the 29th international conference on Software Engineering
                  May 2007
                  784 pages
                  ISBN:0769528287

                  Publisher

                  IEEE Computer Society

                  United States

                  Publication History

                  • Published: 24 May 2007

                  Check for updates

                  Qualifiers

                  • Article

                  Acceptance Rates

                  Overall Acceptance Rate276of1,856submissions,15%

                  Upcoming Conference

                  ICSE 2025

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader