Abstract
Software effort estimation studies still suffer from discordant empirical results (i.e., conclusion instability) mainly due to the lack of rigorous benchmarking methods. So far only one baseline model, namely, Automatically Transformed Linear Model (ATLM), has been proposed yet it has not been extensively assessed. In this article, we propose a novel method based on Linear Programming (dubbed as Linear Programming for Effort Estimation, LP4EE) and carry out a thorough empirical study to evaluate the effectiveness of both LP4EE and ATLM for benchmarking widely used effort estimation techniques. The results of our study confirm the need to benchmark every other proposal against accurate and robust baselines. They also reveal that LP4EE is more accurate than ATLM for 17% of the experiments and more robust than ATLM against different data splits and cross-validation methods for 44% of the cases. These results suggest that using LP4EE as a baseline can help reduce conclusion instability. We make publicly available an open-source implementation of LP4EE in order to facilitate its adoption in future studies.
Supplemental Material
Available for Download
Supplemental movie, appendix, image and software files for, Linear Programming as a Baseline for Software Effort Estimation
- A. J. Albrecht and J. E. Gaffney. 1983. Software function, source lines of code, and development effort prediction: A software science validation. IEEE TSE SE-9, 6 (1983), 639--648. Google ScholarDigital Library
- A. Arcuri and L. Briand. 2014. A Hitchhiker’s guide to statistical tests for assessing randomized algorithms in software engineering. STVR 24, 3 (2014), 219--250. Google ScholarDigital Library
- J. W. Bailey and V. R, Basili. 1981. A meta-model for software development resource expenditures. In Proc. of ICSE ’81. IEEE, 107--116. Google ScholarDigital Library
- P. Bhattacharya and I. Neamtiu. 2011. Bug-fix time prediction models: Can we do better?. In Proc. of MSR ’11. ACM, 207--210. Google ScholarDigital Library
- B. W. Boehm. 1981. Software Engineering Economics. Prentice-Hall, Englewood Cliffs, NJ. Google ScholarDigital Library
- L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. 1984. Classification and Regression Trees. Wadsworth Publishing Company, Belmont, CA.Google Scholar
- L. C. Briand and I. Wieczorek. 2002. Software esource estimation. Encyclopedia of Software Engineering (2002), 1160--1196.Google Scholar
- J. Chen, V. Nair, R. Krishna, and T. Menzies. 2018. “Sampling” as a baseline optimizer for search-based software engineering. IEEE TSE (2018).Google Scholar
- N. Cliff. 1996. Ordinal Methods for Behavioral Data Analysis. L. Erlbaum Associates Inc, New Jersey.Google Scholar
- J. Cohen. 1988. Statistical Power Analysis for the Behavioral Sciences (2nd ed.). L. Earlbaum Associates.Google Scholar
- P. R. Cohen. 1995. Empirical Methods for Artificial Intelligence. MIT Press, Cambridge, MA. Google ScholarDigital Library
- D. Conte, H. E. Dunsmore, and V. Y. Shen. 1986. Software Engineering Metrics and Models. The Benjamin/Cummings Publishing Company, Inc. Google ScholarDigital Library
- A. Corazza, S. Di Martino, F. Ferrucci, C. Gravino, F. Sarro, and E. Mendes. 2010. How effective is tabu search to configure support vector regression for effort estimation?. In Proc. of PROMISE’10. Google ScholarDigital Library
- A. Corazza, S. Di Martino, F. Ferrucci, C. Gravino, F. Sarro, and E. Mendes. 2013. Using tabu search to configure support vector regression for effort estimation. ESE 18, 3 (2013), 506--546.Google ScholarCross Ref
- G. B. Dantzig. 1998. Linear Programming and Extensions. Princeton University Press. Google ScholarDigital Library
- J. M. Desharnais. 1989. Analyse Statistique de la Productivitie des Projets Informatique a Partie de la Technique des Point des Fonction. Ph.D. dissertation. Unpublished Masters thesis, University of Montreal.Google Scholar
- S. Di Martino, F. Ferrucci, C. Gravino, and F. Sarro. 2011. Using web objects for development effort estimation of web applications: A replicated study. Proc. of PROFES’11. 186--201. Google ScholarDigital Library
- S. Di Martino, F. Ferrucci, C. Gravino, and F. Sarro. 2016. Web effort estimation: Function point analysis vs. COSMIC. IST 72 (2016), 90--109. Google ScholarDigital Library
- E. Dimitriadou, K. Hornik, F. Leisch, D. Meyer, and A. Weingessel. 2008. Misc functions of the Department of Statistics (e1071), TU Wien. R package 1 (2008), 5--24.Google Scholar
- F. Ferrucci, C. Gravino, R. Oliveto, and F. Sarro. 2009. Using tabu search to estimate software development effort. In Proc. of MENSURA’09. LNCS 5891, Springer, 307--320. Google ScholarDigital Library
- F. Ferrucci, C. Gravino, R. Oliveto, and F. Sarro. 2010a. Genetic programming for effort estimation: An analysis of the impact of different fitness functions. In Proc. of SSBSE’10. 89--98. Google ScholarDigital Library
- F. Ferrucci, C. Gravino, R. Oliveto, F. Sarro, and E. Mendes. 2010b. Investigating tabu search for web effort estimation. In Proc. of EUROMICRO-SEAA’10. 350--357. Google ScholarDigital Library
- F. Ferrucci, C. Gravino, P. Salza, and F. Sarro. 2015a. Investigating functional and code size measures for mobile applications. In Proc. of EUROMICRO-SEAA’15. 365--368. Google ScholarDigital Library
- F. Ferrucci, C. Gravino, P. Salza, and F. Sarro. 2015b. Investigating functional and code size measures for mobile applications: A replicated study. In Proc. of PROFES’15. 271--287. Google ScholarDigital Library
- F. Ferrucci, C. Gravino, and F. Sarro. 2014. Exploiting prior-phase effort data to estimate the effort for the subsequent phases: A further assessment. In Proc. of PROMISE’14. ACM, 42--51. Google ScholarDigital Library
- F. Ferrucci, M. Harman, and F. Sarro. 2014. Search-based software project management. In Software Project Management in a Changing World, Günther Ruhe and Claes Wohlin (Eds.). Springer, Berlin, 373--399.Google Scholar
- F. Ferrucci, E. Mendes, and F. Sarro. 2012. Web effort estimation: The value of cross-company data set compared to single-company data set. In Proc. of PROMISE’12. ACM, New York, 29--38. Google ScholarDigital Library
- T. Foss, E. Stensrud, B. Kitchenham, and I. Myrtveit. 2003. A simulation study of the model evaluation criterion MMRE. IEEE TSE 29, 11 (2003), 985--995. Google ScholarDigital Library
- W. Fu, T. Menzies, and X. Shen. 2016. Tuning for software analytics: Is it really necessary? IST 76 (2016), 135--146. Google ScholarDigital Library
- T. Kam Ho. 1995. Random decision forests. In Proc. of 3rd International Conference on Document Analysis and Recognition, Vol. 1. 278--282. Google ScholarDigital Library
- Y. Hochberg. 1988. A sharper Bonferroni procedure for multiple tests of significance. Biometrika 75, 4 (1988), 800.Google ScholarCross Ref
- Y. Hochberg and Y. Benjamini. 1990. More powerful procedures for multiple significance testing. Stat. Med. 9, 7 (1990), 811--818.Google ScholarCross Ref
- M. Jorgensen and M. Shepperd. 2007. A systematic review of software development cost estimation studies. IEEE TSE 33, 1 (2007), 33--53. Google ScholarDigital Library
- G. Kadoda, M. Cartwright, and M. Shepperd. 2001. Issues on the effective use of CBR technology for software project prediction. In Proc. of ICCBR’01. LNCS, Vol. 2080. 276--290. Google ScholarDigital Library
- G. Kadoda and M. Shepperd. 2001. Using simulation to evaluate predictions techniques. In Proc. of METRICS’01. IEEE, 349--358. Google ScholarDigital Library
- C. F. Kemerer. 1987. An empirical validation of software cost estimation models. Commun. ACM 30, 5 (1987), 416--429. Google ScholarDigital Library
- J. Keung, E. Kocaguneli, and T. Menzies. 2013. Finding conclusion stability for selecting the best effort predictor in software effort estimation. ASE 20, 4 (2013), 543--567. Google ScholarDigital Library
- B. Kitchenham and E. Mendes. 2009. Why comparative effort prediction studies may be invalid. In Proc. of PROMISE’09. 1--4. Google ScholarDigital Library
- B. Kitchenham, S. L. Pfleeger, B. McColl, and S. Eagan. 2002. An empirical study of maintenance and development estimation accuracy. JSS 64, 1 (2002), 57--77. Google ScholarDigital Library
- B. Kitchenham, L. Pickard, and S. L. Pfleeger. 1995. Case studies for method and tool evaluation. IEEE Softw. 12, 4 (1995), 52--62. Google ScholarDigital Library
- B. Kitchenham, L. M. Pickard, S. G. MacDonell, and M. J. Shepperd. 2001. What accuracy statistics really measure. IEEE Proc. Softw. 148, 3 (2001), 81--85.Google ScholarCross Ref
- E. Kocaguneli, T. Menzies, A. Bener, and J. W. Keung. 2012b. Exploiting the essential assumptions of analogy-based effort estimation. IEEE TSE 38, 2 (2012), 425--438. Google ScholarDigital Library
- E. Kocaguneli, T. Menzies, J. Keung, D. Cok, and R. Madachy. 2013. Active learning and effort estimation: Finding the essential content of software effort estimation data. IEEE TSE 39, 8 (2013), 1040--1053. Google ScholarDigital Library
- E. Kocaguneli, T. Menzies, and J. W. Keung. 2012a. On the value of ensemble effort estimation. IEEE TSE 38, 6 (2012), 1403--1416. Google ScholarDigital Library
- E. Kocaguneli, A. Tosun, and A. Bener. 2010. AI-based models for software effort estimation. In Proc. of EUROMICRO-SEAA’10. 323--326. Google ScholarDigital Library
- M. Korte and D. Port. 2008. Confidence in software cost estimation results based on MMRE and Pred. In Proc. of PROMISE’08. 63--70. Google ScholarDigital Library
- W. B. Langdon, J. Dolado, F. Sarro, and M. Harman. 2016. Exact mean absolute error of baseline predictor, MARP0. IST 73 (2016), 16--18. Google ScholarDigital Library
- Stephen G. MacDonell and Martin J. Shepperd. 2003. Using prior-phase effort records for re-estimation during software projects. In Proc. of METRICS’03. IEEE Computer Society, 73--86. Google ScholarDigital Library
- C. Mair, G. Kadoda, M. Lefley, K. Phalp, C. Schofield, M. Shepperd, and S. Webster. 2000. An investigation of machine learning based prediction systems. JSS 53, 1 (2000), 23--29. Google ScholarDigital Library
- K. Maxwell. 2002. Applied Statistics for Software Managers. Software Quality Institute Series, Prentice Hall.Google Scholar
- E. Mendes, S. Counsell, N. Mosley, C. Triggs, and I. Watson. 2003. A comparative study of cost estimation models for Web hypermedia applications. ESE 8, 23 (2003), 163--196. Google ScholarDigital Library
- E. Mendes, M. Kalinowski, D. Martins, F. Ferrucci, and F. Sarro. 2014. Cross- vs. within-company cost estimation studies revisited: An extended systematic review. In Proc. of EASE’14. ACM, 12:1--12:10. Google ScholarDigital Library
- E. Mendes and B. Kitchenham. 2004a. A comparison of cross-company and within-company effort estimation models for Web applications. In Proc. of EASE’04. 47--55.Google Scholar
- E. Mendes and B. Kitchenham. 2004b. Further comparison of cross-company and within-company effort estimation models for Web applications. In Proc. of METRICS’04. 348--357. Google ScholarDigital Library
- E. Mendes and N. Mosley. 2008. Bayesian network models for Web effort prediction: A comparative study. IEEE TSE 34, 6 (2008), 723--737. Google ScholarDigital Library
- T. Menzies, R. Krishna, and D. Pryor. 2017. The SEACRAFT Repository of Empirical Software Engineering Data. Retrieved March 2017 from https://zenodo.org/communities/seacraft.Google Scholar
- T. Menzies, D. Port, Z. Chen, and J. Hihn. 2005. Validation methods for calibratingsoftware effort models. In Proc. of ICSE’05. 587--595. Google ScholarDigital Library
- T. Menzies and M. Shepperd. 2012. Special issue on repeatable results in software engineering prediction. ESE 17, 1 (2012), 1--17. Google ScholarDigital Library
- L. Minku, F. Sarro, E. Mendes, and F. Ferrucci. 2015. How to make best use of cross-company data for Web effort estimation?. In Proc. of ESEM’15. 1--10. Google ScholarDigital Library
- N. Mittas and L. Angelis. 2012. A permutation test based on regression error characteristic curves for software cost estimation models. ESE 17, 1 (2012), 34--61. Google ScholarDigital Library
- N. Mittas and L. Angelis. 2013. Ranking and clustering software cost estimation models through a multiple comparisons algorithm. IEEE TSE 39, 4 (2013), 537--551. Google ScholarDigital Library
- N. Mittas, I. Mamalikidis, and L. Angelis. 2015. A framework for comparing multiple cost estimation methods using an automated visualization toolkit. IST 57, Supplement C (2015), 310--328.Google Scholar
- Y. Miyazaki, M. Terakado, K. Ozaki, and H. Nozaki. 1994. Robust regression for developing software estimation models. JSS 27, 1 (1994), 3--16. Google ScholarDigital Library
- I. Myrtveit, M. Shepperd, and E. Stensrud. 2005. Reliability and validity in comparative studies of software prediction models. IEEE TSE 31, 5 (2005), 380--39. Google ScholarDigital Library
- J. C. Nash. 2000. The (Dantzig) simplex method for linear programming. Comput. Sci. Engi. 2, 1 (2000), 29--31. Google ScholarDigital Library
- J. Neter, M. H. Kutner, C. J. Nachtsheim, and W. Wasserman. 1996. Applied Linear Statistical Models. McGraw-Hill, Irwin.Google Scholar
- G. Neumann, M. Harman, and S. M. Poulding. 2015. Transformed Vargha-Delaney effect size. In Proc. of SSBSE’15. 318--324.Google Scholar
- D. Port and M. Korte. 2008. Comparative studies of the model evaluation criterions MMRE and Pred in software cost estimation research. In Proc. of ESEM’08. 51--60. Google ScholarDigital Library
- R Development Core Team. 2011. R: A Language and Environment for Statistical Computing. Retrieved from http://www.r-project.org/foundation/.Google Scholar
- R. L. Rardin. 1998. Optimization in Operations Research. Vol. 166. Prentice Hall, Upper Saddle River, NJ.Google Scholar
- J. D. Rodriguez, A. Perez, and J. A. Lozano. 2010. Sensitivity analysis of k-fold cross validation in prediction error estimation. IEEE Trans. Pattern Anal. Mach. Intell 32, 3 (2010), 569--575. Google ScholarDigital Library
- P. Royston. 1982. An extension of Shapiro and Wilk’s W test for normality to large samples. Appl. Stat. 31, 2 (1982), 115--124.Google ScholarCross Ref
- F. Sarro, S. Di Martino, F. Ferrucci, and C. Gravino. 2012a. A further analysis on the use of genetic algorithm to configure support vector machines for inter-release fault prediction. In Proc. of SAC’12. ACM, 1215--1220. Google ScholarDigital Library
- F. Sarro, F. Ferrucci, and C. Gravino. 2012b. Single and multi objective genetic programming for software development effort estimation. In Proc. of SAC’12. ACM, 1221--1226. Google ScholarDigital Library
- F. Sarro, F. Ferrucci, M. Harman, A. Manna, and J. Ren. 2017. Adaptive multi-objective evolutionary algorithms for overtime planning in software projects. IEEE TSE (2017).Google Scholar
- F. Sarro, M. Harman, Y. Jia, and Y. Zhang. 2018. Customer rating reactions can be predicted purely using app features. In Proc. of RE'18. 76--87.Google Scholar
- F. Sarro, A. Petrozziello, and M. Harman. 2016. Multi-objective software effort estimation. In Proc. of ICSE’16. 619--630. Google ScholarDigital Library
- M. Shepperd, M. Cartwright, and G. Kadoda. 2000. On building prediction systems for software engineers. ESE 5, 3 (2000), 175--182. Google ScholarDigital Library
- M. Shepperd and C. Schofield. 2000. Estimating software project effort using analogies. IEEE TSE 23, 11 (2000), 736--743. Google ScholarDigital Library
- M. Shepperd, C. Schofield, and B. Kitchenham. 1996. Effort estimation using analogy. In Proc. of the International Conference on Software Engineering. IEEEs, 170--178. Google ScholarDigital Library
- M. J. Shepperd and S. G. MacDonell. 2012. Evaluating prediction systems in software project estimation. IST 54, 8 (2012), 820--827. Google ScholarDigital Library
- B. Sigweni, M. Shepperd, and T. Turchi. 2016. Realistic assessment of software effort estimation models. In Proc. of EASE’16. ACM, 41:1--41:6. Google ScholarDigital Library
- E. Stensrud, T. Foss, B. Kitchenham, and I. Myrtveit. 2003. A further empirical investigation of the relationship between MRE and project size. ESE 8, 2 (2003), 139--161. Google ScholarDigital Library
- E. Stensrud and I. Myrtveit. 1996. Human performance estimating with analogy and regression models: An empirical validation. In Proc. of METRICS’98. IEEE, 205--213. Google ScholarDigital Library
- C. Tantithamthavorn, S. McIntosh, A. E. Hassan, and K. Matsumoto. 2018. The impact of automated parameter optimization on defect prediction models. IEEE TSE (2018).Google Scholar
- P. A. Whigham, Caitlin A. Owen, and S. G. Macdonell. 2015. A baseline model for software effort estimation. ACM TOSEM. 24, 3 (2015), 20:1--20:11. Google ScholarDigital Library
- F. H. Yun. 2010. China: Effort Estimation Dataset. Retrieved fromGoogle Scholar
- H. Zhang, L. Gong, and S. Versteeg. 2013. Predicting bug-fixing time: An empirical study of commercial software projects. In Proc. of ICSE’13. 1042--1051. Google ScholarDigital Library
Index Terms
- Linear Programming as a Baseline for Software Effort Estimation
Recommendations
On the value of outlier elimination on software effort estimation research
Producing accurate and reliable software effort estimation has always been a challenge for both academic research and software industries. Regarding this issue, data quality is an important factor that impacts the estimation accuracy of effort ...
The adjusted analogy-based software effort estimation based on similarity distances
Analogy-based estimation is a widely adopted problem solving method that has been evaluated and confirmed in software effort or cost estimation domains. The similarity measures between pairs of projects play a critical role in the analogy-based software ...
Systematic literature review of machine learning based software development effort estimation models
Context: Software development effort estimation (SDEE) is the process of predicting the effort required to develop a software system. In order to improve estimation accuracy, many researchers have proposed machine learning (ML) based SDEE models (ML ...
Comments