ABSTRACT
GitHub, one of the most popular social coding platforms, is the platform of reference when mining Open Source repositories to learn from past experiences. In the last years, a number of research papers have been published reporting findings based on data mined from GitHub. As the community continues to deepen in its understanding of software engineering thanks to the analysis performed on this platform, we believe it is worthwhile to reflect how research papers have addressed the task of mining GitHub repositories over the last years. In this regard, we present a meta-analysis of 93 research papers which addresses three main dimensions of those papers: i) the empirical methods employed, ii) the datasets they used and iii) the limitations reported. Results of our meta-analysis show some concerns regarding the dataset collection process and size, the low level of replicability, poor sampling techniques, lack of longitudinal studies and scarce variety of methodologies.
- K. Aggarwal, A. Hindle, and E. Stroulia. Co-evolution of project documentation and popularity within GitHub. MSR, pages 360--363, 2014. Google ScholarDigital Library
- A. S. Badashian, A. Esteki, A. Gholipour, A. Hindle, and E. Stroulia. Involvement, contribution and influence in GitHub and StackOverflow. CSSE, pages 19--33, 2014. Google ScholarDigital Library
- C. Bird, P. Rigby, and E. Barr. The promises and perils of mining git. In MSR conf., pages 1--10, 2009. Google ScholarDigital Library
- K. Crowston, K. Wei, J. Howison, and A. Wiggins. Free/libre open-source software development: What we know and what we do not know. ACM Computing Surveys (CSUR), 44(2):7, 2012. Google ScholarDigital Library
- R. Dyer, H. A. Nguyen, H. Rajan, and T. N. Nguyen. Boa: A language and infrastructure for analyzing ultra-large-scale software repositories. ICSE, pages 422--431, 2013. Google ScholarDigital Library
- G. Gousios and D. Spinellis. Ghtorrent: GitHub's data from a firehose. MSR, pages 12--21, 2012. Google ScholarDigital Library
- H. Hemmati, S. Nadi, O. Baysal, O. Kononenko, W. Wang, R. Holmes, and M. W. Godfrey. The msr cookbook: Mining a decade of research. MSR. Google ScholarDigital Library
- J. Howison and K. Crowston. The perils and pitfalls of mining SourceForge. In MSR conf., pages 7--11, 2004.Google ScholarCross Ref
- E. Kalliamvakou, G. Gousios, K. Blincoe, L. Singer, D. M. German, and D. Damian. An in-depth study of the promises and perils of mining GitHub. Empirical Software Engineering, pages 1--37, 2015.Google Scholar
- M. Nagappan, T. Zimmermann, and C. Bird. Diversity in software engineering research. ESEC/FSE, pages 466--476, 2013. Google ScholarDigital Library
- R. Padhye, S. Mani, and V. S. Sinha. A study of external community contribution to open-source projects on GitHub. In 11th Working Conference on Mining Software Repositories, pages 332--335, 2014. Google ScholarDigital Library
- G. Robles. Replicating msr: A study of the potential replicability of papers published in the mining software repositories proceedings. MSR, pages 171--180, 2010.Google ScholarCross Ref
- A. Serebrenik and T. Mens. Challenges in software ecosystems research. ECSAW, pages 40:1--40:6, 2015. Google ScholarDigital Library
- F. Thung, T. F. Bissyande, D. Lo, and L. Jiang. Network Structure of Social Coding in GitHub. In 17th European Conference on Software Maintenance and Reengineering, pages 323--326, 2013. Google ScholarDigital Library
- B. Vasilescu, V. Filkov, and A. Serebrenik. Stackoverflow and GitHub: associations between software development and crowdsourced knowledge. SocialCom, pages 188--195, 2013. Google ScholarDigital Library
- J. Xavier and A. Macedo. Understanding the popularity of reporters and assignees in the GitHub. In 26th International Conference on Software Engineering and Knowledge Engineering, pages 484--489, 2014.Google Scholar
Index Terms
- Findings from GitHub: methods, datasets and limitations
Recommendations
Lean GHTorrent: GitHub data on demand
MSR 2014: Proceedings of the 11th Working Conference on Mining Software RepositoriesIn recent years, GitHub has become the largest code host in the world, with more than 5M developers collaborating across 10M repositories. Numerous popular open source projects (such as Ruby on Rails, Homebrew, Bootstrap, Django or jQuery) have chosen ...
GitHub's milestone tool: A mixed‐methods analysis on its use
AbstractSocial coding site GitHub provides developers with many management tools to facilitate project maintenance and developer collaboration. Milestone tool, in particular, plays an important role in organizing and tracking progress on groups of issues ...
The Relevance of SourceForge Data in the Age of GitHub
GitHub's current prominence over SourceForge among Open Source Software (OSS) developers calls into question the continued relevance of SourceForge data, as well as the external validity and relevance of studies that investigate OSS theories using ...
Comments