skip to main content
10.5555/2486788.2487048acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Data science for software engineering

Published:18 May 2013Publication History

ABSTRACT

Target audience: Software practitioners and researchers wanting to understand the state of the art in using data science for software engineering (SE). Content: In the age of big data, data science (the knowledge of deriving meaningful outcomes from data) is an essential skill that should be equipped by software engineers. It can be used to predict useful information on new projects based on completed projects. This tutorial offers core insights about the state-of-the-art in this important field. What participants will learn: Before data science: this tutorial discusses the tasks needed to deploy machine-learning algorithms to organizations (Part1: Organization Issues). During data science: from discretization to clustering to dichotomization and statistical analysis. And the rest: When local data is scarce, we show how to adapt data from other organizations to local problems. When privacy concerns block access, we show how to privatize data while still being able to mine it. When working with data of dubious quality, we show how to prune spurious information. When data or models seem too complex, we show how to simplify data mining results. When data is too scarce to support intricate models, we show methods for generating predictions. When the world changes, and old models need to be updated, we show how to handle those updates. When the effect is too complex for one model, we show how to reason across ensembles of models. Pre-requisites: This tutorial makes minimal use of maths of advanced algorithms and would be understandable by developers and technical managers.

References

  1. Z. Chen, T. Menzies, D. Port, and B. Boehm. Finding the right data for software cost modeling. IEEE Software, 22(6):38–46, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. P. Domingos. A few useful things to know about machine learning. Communications of ACM, 55(10):78–87, Oct. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. B. Dominic and C. D. Making advanced analytics work for you. Harvard Business Review, 90(10):78–83, 2012.Google ScholarGoogle Scholar
  4. L. GMINKU and X. YAO. Can cross-company data improve performance in software effort estimation? In PROMISE’12: Proceedings of the 8th International Conference on Predictive Models in Software Engineering, pages 69–78, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Grechanik, C. Csallner, C. Fu, and Q. Xie. Is data privacy always good for software testing? In ISSRE’10: IEEE 21st International Symposium on Software Reliability Engineering, pages 368–377, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. E. Kocaguneli and T. Menzies. How to find relevant data for effort estimation? In Empirical Software Engineering and Measurement (ESEM), 2011 International Symposium on, pages 255–264. IEEE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. E. Kocaguneli, T. Menzies, A. Bener, and J. W. Keung. Exploiting the essential assumptions of analogy-based effort estimation. IEEE Transactions on Software Engineering, 38(2):425–438, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. E. Kocaguneli, T. Menzies, and J. Keung. On the value of ensemble effort estimation. IEEE Transactions on Software Engineering, 38(6):1403–1416, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. T. Menzies, C. Bird, T. Zimmermann, W. Schulte, and E. Kocaguneli. The inductive software engineering manifesto: principles for industrial data mining. In Proceedings of the International Workshop on Machine Learning Technologies in Software Engineering, MALETS ’11, pages 19–26, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. T. Menzies, A. Butcher, D. Cok, A. Marcus, L. Layman, F. Shull, B. Turhan, and T. Zimmermann. Local vs. global lessons for defect prediction and effort estimation. IEEE Transactions on Software Engineering, pages 1–1, 2012.Google ScholarGoogle Scholar
  11. T. Menzies, A. Butcher, A. Marcus, T. Zimmermann, and D. Cok. Local vs. global models for effort estimation and defect prediction. In ASE’11: 26th IEEE/ACM International Conference on Automated Software Engineering, pages 343–351, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. Menzies, J. Greenwald, and A. Frank. Data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering, 33(1):2–13, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. L. L. Minku and X. Yao. Ensembles and locality: Insight on improving software effort estimation. Information and Software Technology, 2012.Google ScholarGoogle Scholar
  14. F. Peters and T. Menzies. Privacy and utility for defect prediction: Experiments with morph. In ICSE’12: 34th International Conference on Software Engineering, pages 189–199, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Shepperd. It doesn‘t matter what you do, but it does matter who does it! In CREST Open Workshop, 2011.Google ScholarGoogle Scholar
  16. B. Turhan. On the dataset shift problem in software engineering prediction models. Empirical Software Engineering, 17:62–74, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. B. Turhan, T. Menzies, A. B. Bener, and J. Di Stefano. On the relative value of cross-company and within-company data for defect prediction. Empirical Software Engineering, 14(5):540–578, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. B. Turhan, A. T. Misirli, and A. Bener. Empirical evaluation of the effects of mixed project data on learning defect predictors. Information and Software Technology, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Data science for software engineering

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader