Abstract
The ability to predict which files in a large software system are most likely to contain the largest numbers of faults in the next release can be a very valuable asset. To accomplish this, a negative binomial regression model using information from previous releases has been developed and used to predict the numbers of faults for a large industrial inventory system. The files of each release were sorted in descending order based on the predicted number of faults and then the first 20% of the files were selected. This was done for each of fifteen consecutive releases, representing more than four years of field usage. The predictions were extremely accurate, correctly selecting files that contained between 71% and 92% of the faults, with the overall average being 83%. In addition, the same model was used on data for the same system's releases, but with all fault data prior to integration testing removed. The prediction was again very accurate, ranging from 71% to 93%, with the average being 84%. Predictions were made for a second system, and again the first 20% of files accounted for 83% of the identified faults. Finally, a highly simplified predictor was considered which correctly predicted 73% and 74% of the faults for the two systems.
- E.N. Adams. Optimizing Preventive Service of Software Products. IBM J. Res. Develop., Vol 28, No 1, Jan 1984, pp.2--14.Google ScholarDigital Library
- V.R. Basili and B.T. Perricone. Software Errors and Complexity: An Empirical Investigation. Communications of the ACM, Vol 27, No 1, Jan 1984, pp.42--52. Google ScholarDigital Library
- N.E. Fenton and N. Ohlsson. Quantitative Analysis of Faults and Failures in a Complex Software System. IEEE Trans. on Software Engineering, Vol 26, No 8, Aug 2000, pp.797--814. Google ScholarDigital Library
- T.L. Graves, A.F. Karr, J.S. Marron, and H. Siy. Predicting Fault Incidence Using Software Change History. IEEE Trans. on Software Engineering, Vol 26, No. 7, July 2000, pp.653--661. Google ScholarDigital Library
- L. Hatton. Reexamining the Fault Density - Component Size Connection. IEEE Software, March/April 1997, pp.89--97. Google ScholarDigital Library
- T.M. Khoshgoftaar, E.B. Allen, K.S. Kalaichelvan, N. Goel. Early Quality Prediction: A Case Study in Telecommunications. IEEE Software, Jan 1996, pp.65--71. Google ScholarDigital Library
- T.J. McCabe. A Complexity Measure. IEEE Trans. on Software Engineering, Vol 2, 1976, pp.308--320.Google ScholarDigital Library
- P. McCullagh and J.A. Nelder. Generalized Linear Models, Second Edition, Chapman and Hall, London, 1989.Google ScholarCross Ref
- K-H. Moller and D.J. Paulish. An Empirical Investigation of Software Fault Distribution. Proc. IEEE First International Software Metrics Symposium, Baltimore, Md., May 21-22, 1993, pp.82--90.Google ScholarCross Ref
- J.C. Munson and T.M. Khoshgoftaar. The Detection of Fault-Prone Programs. IEEE Trans. on Software Engineering, Vol 18, No 5, May 1992, pp.423--433. Google ScholarDigital Library
- T. Ostrand and E.J. Weyuker. The Distribution of Faults in a Large Industrial Software System. Proc. ACM/International Symposium on Software Testing and Analysis (ISSTA2002), Rome, Italy, July 2002, pp.55--64. Google ScholarDigital Library
- T. Ostrand, E.J. Weyuker, and R. Bell. Using Static Analysis to Determine Where to Focus Dynamic Testing Effort. Proc. IEE/Workshop on Dynamic Analysis (WODA04), Edinburgh, May 2004.Google ScholarCross Ref
- M. Pighin and A. Marzona. An Empirical Analysis of Fault Persistence Through Software Releases. Proc. IEEE/ACM ISESE 2003, pp.206--212. Google ScholarDigital Library
- SAS Institute Inc. SAS/STAT User's Guide, Version 8, SAS Institute, Cary, NC, 1999.Google Scholar
Index Terms
- Where the bugs are
Recommendations
Looking for bugs in all the right places
ISSTA '06: Proceedings of the 2006 international symposium on Software testing and analysisWe continue investigating the use of a negative binomial regression model to predict which files in a large industrial software system are most likely to contain many faults in the next release. A new empirical study is described whose subject is an ...
Where the bugs are
ISSTA '04: Proceedings of the 2004 ACM SIGSOFT international symposium on Software testing and analysisThe ability to predict which files in a large software system are most likely to contain the largest numbers of faults in the next release can be a very valuable asset. To accomplish this, a negative binomial regression model using information from ...
Predicting the Location and Number of Faults in Large Software Systems
Advance knowledge of which files in the next release of a large software system are most likely to contain the largest numbers of faults can be a very valuable asset. To accomplish this, a negative binomial regression model has been developed and used ...
Comments