Article

Predicting Faults from Cached History

Authors:
Sunghun Kim

Massachusetts Institute of Technology, USA

Massachusetts Institute of Technology, USA
View Profile

,
Thomas Zimmermann

Saarland University, Germany

Saarland University, Germany
View Profile

,
E. James Whitehead Jr.

University of California, Santa Cruz, USA

University of California, Santa Cruz, USA
View Profile

,
Andreas Zeller

Saarland University, Germany

Saarland University, Germany
View Profile

ICSE '07: Proceedings of the 29th international conference on Software EngineeringMay 2007Pages 489–498https://doi.org/10.1109/ICSE.2007.66

Published:24 May 2007Publication History

ICSE '07: Proceedings of the 29th international conference on Software Engineering

Pages 489–498

ABSTRACT

We analyze the version history of 7 software systems to predict the most fault prone entities and files. The basic assumption is that faults do not occur in isolation, but rather in bursts of several related faults. Therefore, we cache locations that are likely to have faults: starting from the location of a known (fixed) fault, we cache the location itself, any locations changed together with the fault, recently added locations, and recently changed locations. By consulting the cache at the moment a fault is fixed, a developer can detect likely fault-prone locations. This is useful for prioritizing verification and validation resources on the most fault prone files or entities. In our evaluation of seven open source projects with more than 200,000 revisions, the cache selects 10% of the source code files; these files account for 73%-95% of faults-- a significant advance beyond the state of the art.

References

{1} E. Alpaydin, Introduction to Machine Learning: The MIT Press, 2004. Google ScholarDigital Library
{2} B. Behlendorf, C. M. Pilato, G. Stein, K. Fogel, K. Hancock, and B. Collins-Sussman, "Subversion Project Homepage," 2005.Google Scholar
{3} J. Bevan and E. J. Whitehead, Jr., "Identification of Software Instabilities," Proc. of 2003 Working Conference on Reverse Engineering (WCRE 2003), Victoria, Canada, 2003. Google ScholarDigital Library
{4} J. Bevan, E. J. Whitehead, Jr., S. Kim, and M. Godfrey, "Facilitating Software Evolution with Kenyon," Proc. of the 2005 European Software Engineering Conference and 2005 Foundations of Software Engineering (ESEC/FSE 2005), Lisbon, Portugal, 2005. Google ScholarDigital Library
{5} D. Cubranic and G. C. Murphy, "Hipikat: Recommending pertinent software development artifacts," Proc. of 25th International Conference on Software Engineering (ICSE), Portland, Oregon, 2003, pp. 408-418. Google ScholarDigital Library
{6} V. Dallmeier, P. Weißgerber, and T. Zimmermann, "APFEL: A Preprocessing Framework For Eclipse," http://www.st.cs.unisb.de/softevo/apfel/, 2005.Google Scholar
{7} M. Fischer, M. Pinzger, and H. Gall, "Populating a Release History Database from Version Control and Bug Tracking Systems," Proc. of 2003 Int'l Conference on Software Maintenance (ICSM'03), 2003, pp. 23-32. Google ScholarDigital Library
{8} H. Gall, M. Jazayeri, and J. Krajewski, "CVS Release History Data for Detecting Logical Couplings," Proc. of Sixth International Workshop on Principles of Software Evolution (IWPSE'03), Helsinki, Finland, 2003, pp. 13-23. Google ScholarDigital Library
{9} M. W. Godfrey and L. Zou, "Using Origin Analysis to Detect Merging and Splitting of Source Code Entities," IEEE Trans. on Software Engineering, vol. 31, pp. 166-181, 2005. Google ScholarDigital Library
{10} T. L. Graves, A. F. Karr, J. S. Marron, and H. Siy, "Predicting Fault Incidence Using Software Change History," IEEE Transactions on Software Engineering, vol. 26, pp. 653-661, 2000. Google ScholarDigital Library
{11} A. E. Hassan and R. C. Holt, "The Top Ten List: Dynamic Fault Prediction," Proc. of International Conference on Software Maintenance (ICSM 2005), Budapest, Hungary, 2005, pp. 263-272. Google ScholarDigital Library
{12} T. M. Khoshgoftaar and E. B. Allen, "Ordering Fault-Prone Software Modules," Software Quality Journal, vol. 11, pp. 19- 37, 2003. Google ScholarDigital Library
{13} T. M. Khoshgoftaar and E. B. Allen, "Predicting the Order of Fault-Prone Modules in Legacy Software," Proc. of The Ninth International Symposium on Software Reliability Engineering, Paderborn, Germany, 1998, pp. 344-353. Google ScholarDigital Library
{14} S. Kim, K. Pan, and E. J. Whitehead, Jr., "When Functions Change Their Names: Automatic Detection of Origin Relationships," Proc. of 12th Working Conference on Reverse Engineering (WCRE 2005), Pittsburgh, PA, USA, 2005, pp. 143-152. Google ScholarDigital Library
{15} S. Kim, T. Zimmermann, K. Pan, and E. J. Whitehead, Jr., "Automatic Identification of Bug Introducing Changes," Proc. of International Conference on Automated Software Engineering (ASE 2006), Tokyo, Japan, 2006. Google ScholarDigital Library
{16} A. J. Ko and B. A. Myers, "A Framework and Methodology for Studying the Causes of Software Errors in Programming Systems," Journal of Visual Languages and Computing, vol. 16, pp. 41-84, 2005. Google ScholarDigital Library
{17} A. Mockus and L. G. Votta, "Identifying Reasons for Software Changes Using Historic Databases," Proc. of International Conference on Software Maintenance (ICSM 2000), San Jose, California, USA, 2000, pp. 120-130. Google ScholarDigital Library
{18} A. Mockus and D. M. Weiss, "Predicting Risk of Software Changes," Bell Labs Technical Journal, vol. 5, pp. 169-180, 2002.Google ScholarCross Ref
{19} N. Nagappan and T. Ball, "Use of Relative Code Churn Measures to Predict System Defect Density," Proc. of 2005 Int'l Conference on Software Engineering (ICSE 2005), Saint Louis, Missouri, USA, 2005, pp. 284-292. Google ScholarDigital Library
{20} N. Nagappan, T. Ball, and A. Zeller, "Mining Metrics to Predict Component Failures," Proc. of 2006 Int'l Conference on Software Engineering (ICSE 2006), Shanghai, China, 2006, pp. 452-461. Google ScholarDigital Library
{21} T. J. Ostrand, E. J. Weyuker, and R. M. Bell, "Predicting the Location and Number of Faults in Large Software Systems," IEEE Transactions on Software Engineering, vol. 31, pp. 340- 355, 2005. Google ScholarDigital Library
{22} J. ¿liwerski, T. Zimmermann, and A. Zeller, "When Do Changes Induce Fixes?," Proc. of Int'l Workshop on Mining Software Repositories (MSR 2005), Saint Louis, Missouri, USA, 2005. Google ScholarDigital Library
{23} J. ¿liwerski, T. Zimmermann, and A. Zeller, "HATARI: Raising Risk Awareness. Research Demonstration," Proc. of the 2005 European Software Engineering Conference and 2005 Foundations of Software Engineering (ESEC/FSE 2005), Lisbon, Portugal, 2005, pp. 107-110. Google ScholarDigital Library
{24} P. Weißgerber and S. Diehl, "Identifying Refactorings from Source-Code Changes," Proc. of International Conference on Automated Software Engineering (ASE 2006), Tokyo, Japan, 2006, pp. 231-240. Google ScholarDigital Library
{25} T. Zimmermann and P. Weißgerber, "Preprocessing CVS Data for Fine-Grained Analysis," Proc. of Proc. Intl. Workshop on Mining Software Repositories (MSR), Edinburgh, Scotland, 2004.Google Scholar
{26} T. Zimmermann, P. Weißgerber, S. Diehl, and A. Zeller, "Mining Version Histories to Guide Software Changes," IEEE Trans. Software Eng., vol. 31, pp. 429-445, 2005. Google ScholarDigital Library

Index Terms

Recommendations

Predicting faults from cached history
ISEC '08: Proceedings of the 1st India software engineering conference

We analyze the version history of 7 software systems to predict the most fault prone entities and files. The basic assumption is that faults do not occur in isolation, but rather in bursts of several related faults. Therefore, we cache locations that ...
Read More
On undetectable faults and fault diagnosis

The presence of an undetectable fault u_i may modify the response of a detectable fault d_j to a test set used for fault diagnosis. This may impact the accuracy of fault diagnosis based on the responses of single faults. Many state-of-the-art diagnosis ...
Read More
Using Dummy Bridging Faults to Define Reduced Sets of Target Faults

To address the large numbers of bridging faults in a circuit, several approaches have been proposed for the selection of subsets of bridging faults as targets for test generation. A different approach that can be viewed as a fault collapsing method ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICSE '07: Proceedings of the 29th international conference on Software Engineering
May 2007
784 pages
ISBN:0769528287
Sponsors
In-Cooperation
Publisher
IEEE Computer Society
United States
Publication History
- Published: 24 May 2007
Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate276of1,856submissions,15%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 131
  Total Citations
  View Citations
- 1,597
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Predicting Faults from Cached History

ICSE '07: Proceedings of the 29th international conference on Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Predicting faults from cached history

On undetectable faults and fault diagnosis

Using Dummy Bridging Faults to Define Reduced Sets of Target Faults

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Predicting Faults from Cached History

ICSE '07: Proceedings of the 29th international conference on Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Predicting faults from cached history

On undetectable faults and fault diagnosis

Using Dummy Bridging Faults to Define Reduced Sets of Target Faults

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media