research-article

Data science for software engineering

Authors:
Tim Menzies

West Virginia University, USA

West Virginia University, USA
View Profile

,
Ekrem Kocaguneli

West Virginia University, USA

West Virginia University, USA
View Profile

,
Fayola Peters

West Virginia University, USA

West Virginia University, USA
View Profile

,
Burak Turhan

University of Oulu, Finland

University of Oulu, Finland
View Profile

,
Leandro L. Minku

University of Birmingham, UK

University of Birmingham, UK
View Profile

Authors Info & Claims

ICSE '13: Proceedings of the 2013 International Conference on Software EngineeringMay 2013Pages 1484–1486

Published:18 May 2013Publication History

ICSE '13: Proceedings of the 2013 International Conference on Software Engineering

Pages 1484–1486

ABSTRACT

Target audience: Software practitioners and researchers wanting to understand the state of the art in using data science for software engineering (SE). Content: In the age of big data, data science (the knowledge of deriving meaningful outcomes from data) is an essential skill that should be equipped by software engineers. It can be used to predict useful information on new projects based on completed projects. This tutorial offers core insights about the state-of-the-art in this important field. What participants will learn: Before data science: this tutorial discusses the tasks needed to deploy machine-learning algorithms to organizations (Part1: Organization Issues). During data science: from discretization to clustering to dichotomization and statistical analysis. And the rest: When local data is scarce, we show how to adapt data from other organizations to local problems. When privacy concerns block access, we show how to privatize data while still being able to mine it. When working with data of dubious quality, we show how to prune spurious information. When data or models seem too complex, we show how to simplify data mining results. When data is too scarce to support intricate models, we show methods for generating predictions. When the world changes, and old models need to be updated, we show how to handle those updates. When the effect is too complex for one model, we show how to reason across ensembles of models. Pre-requisites: This tutorial makes minimal use of maths of advanced algorithms and would be understandable by developers and technical managers.

References

Z. Chen, T. Menzies, D. Port, and B. Boehm. Finding the right data for software cost modeling. IEEE Software, 22(6):38–46, 2005. Google ScholarDigital Library
P. Domingos. A few useful things to know about machine learning. Communications of ACM, 55(10):78–87, Oct. 2012. Google ScholarDigital Library
B. Dominic and C. D. Making advanced analytics work for you. Harvard Business Review, 90(10):78–83, 2012.Google Scholar
L. GMINKU and X. YAO. Can cross-company data improve performance in software effort estimation? In PROMISE’12: Proceedings of the 8th International Conference on Predictive Models in Software Engineering, pages 69–78, 2012. Google ScholarDigital Library
M. Grechanik, C. Csallner, C. Fu, and Q. Xie. Is data privacy always good for software testing? In ISSRE’10: IEEE 21st International Symposium on Software Reliability Engineering, pages 368–377, 2010. Google ScholarDigital Library
E. Kocaguneli and T. Menzies. How to find relevant data for effort estimation? In Empirical Software Engineering and Measurement (ESEM), 2011 International Symposium on, pages 255–264. IEEE, 2011. Google ScholarDigital Library
E. Kocaguneli, T. Menzies, A. Bener, and J. W. Keung. Exploiting the essential assumptions of analogy-based effort estimation. IEEE Transactions on Software Engineering, 38(2):425–438, 2012. Google ScholarDigital Library
E. Kocaguneli, T. Menzies, and J. Keung. On the value of ensemble effort estimation. IEEE Transactions on Software Engineering, 38(6):1403–1416, 2012. Google ScholarDigital Library
T. Menzies, C. Bird, T. Zimmermann, W. Schulte, and E. Kocaguneli. The inductive software engineering manifesto: principles for industrial data mining. In Proceedings of the International Workshop on Machine Learning Technologies in Software Engineering, MALETS ’11, pages 19–26, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
T. Menzies, A. Butcher, D. Cok, A. Marcus, L. Layman, F. Shull, B. Turhan, and T. Zimmermann. Local vs. global lessons for defect prediction and effort estimation. IEEE Transactions on Software Engineering, pages 1–1, 2012.Google Scholar
T. Menzies, A. Butcher, A. Marcus, T. Zimmermann, and D. Cok. Local vs. global models for effort estimation and defect prediction. In ASE’11: 26th IEEE/ACM International Conference on Automated Software Engineering, pages 343–351, 2011. Google ScholarDigital Library
T. Menzies, J. Greenwald, and A. Frank. Data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering, 33(1):2–13, 2007. Google ScholarDigital Library
L. L. Minku and X. Yao. Ensembles and locality: Insight on improving software effort estimation. Information and Software Technology, 2012.Google Scholar
F. Peters and T. Menzies. Privacy and utility for defect prediction: Experiments with morph. In ICSE’12: 34th International Conference on Software Engineering, pages 189–199, 2012. Google ScholarDigital Library
M. Shepperd. It doesn‘t matter what you do, but it does matter who does it! In CREST Open Workshop, 2011.Google Scholar
B. Turhan. On the dataset shift problem in software engineering prediction models. Empirical Software Engineering, 17:62–74, 2012. Google ScholarDigital Library
B. Turhan, T. Menzies, A. B. Bener, and J. Di Stefano. On the relative value of cross-company and within-company data for defect prediction. Empirical Software Engineering, 14(5):540–578, 2009. Google ScholarDigital Library
B. Turhan, A. T. Misirli, and A. Bener. Empirical evaluation of the effects of mixed project data on learning defect predictors. Information and Software Technology, 2012. Google ScholarDigital Library

Index Terms

Data science for software engineering
1. Social and professional topics
  1. Professional topics
    1. Management of computing and information systems
2. Software and its engineering

Recommendations

Perspectives on Data Science for Software Engineering
Read More
Mining software engineering data
ICSE '10: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 2

Software engineering data (such as code bases, execution traces, historical code changes, mailing lists, and bug databases) contains a wealth of information about a project's status, progress, and evolution. Using well-established data mining techniques,...
Read More
First International Workshop on Software Engineering for Computational Science & Engineering

In recognition of the general lack of exposure scientists have to software engineering and vice versa, a workshop was held during the 2008 International Conference on Software Engineering in Leipzig, Germany. The workshop's goal was to bring together ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICSE '13: Proceedings of the 2013 International Conference on Software Engineering
May 2013
1561 pages
ISBN:9781467330763
General Chair:
David Notkin,
Program Chairs:
Betty H. C. Cheng,
Klaus Pohl
Sponsors
In-Cooperation
Publisher
IEEE Press
Publication History
- Published: 18 May 2013
Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate276of1,856submissions,15%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 589
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Data science for software engineering

ICSE '13: Proceedings of the 2013 International Conference on Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Perspectives on Data Science for Software Engineering

Mining software engineering data

First International Workshop on Software Engineering for Computational Science & Engineering