skip to main content
10.1145/2901739.2901776acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Findings from GitHub: methods, datasets and limitations

Published:14 May 2016Publication History

ABSTRACT

GitHub, one of the most popular social coding platforms, is the platform of reference when mining Open Source repositories to learn from past experiences. In the last years, a number of research papers have been published reporting findings based on data mined from GitHub. As the community continues to deepen in its understanding of software engineering thanks to the analysis performed on this platform, we believe it is worthwhile to reflect how research papers have addressed the task of mining GitHub repositories over the last years. In this regard, we present a meta-analysis of 93 research papers which addresses three main dimensions of those papers: i) the empirical methods employed, ii) the datasets they used and iii) the limitations reported. Results of our meta-analysis show some concerns regarding the dataset collection process and size, the low level of replicability, poor sampling techniques, lack of longitudinal studies and scarce variety of methodologies.

References

  1. K. Aggarwal, A. Hindle, and E. Stroulia. Co-evolution of project documentation and popularity within GitHub. MSR, pages 360--363, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. S. Badashian, A. Esteki, A. Gholipour, A. Hindle, and E. Stroulia. Involvement, contribution and influence in GitHub and StackOverflow. CSSE, pages 19--33, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. Bird, P. Rigby, and E. Barr. The promises and perils of mining git. In MSR conf., pages 1--10, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. K. Crowston, K. Wei, J. Howison, and A. Wiggins. Free/libre open-source software development: What we know and what we do not know. ACM Computing Surveys (CSUR), 44(2):7, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. Dyer, H. A. Nguyen, H. Rajan, and T. N. Nguyen. Boa: A language and infrastructure for analyzing ultra-large-scale software repositories. ICSE, pages 422--431, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. G. Gousios and D. Spinellis. Ghtorrent: GitHub's data from a firehose. MSR, pages 12--21, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. H. Hemmati, S. Nadi, O. Baysal, O. Kononenko, W. Wang, R. Holmes, and M. W. Godfrey. The msr cookbook: Mining a decade of research. MSR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Howison and K. Crowston. The perils and pitfalls of mining SourceForge. In MSR conf., pages 7--11, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  9. E. Kalliamvakou, G. Gousios, K. Blincoe, L. Singer, D. M. German, and D. Damian. An in-depth study of the promises and perils of mining GitHub. Empirical Software Engineering, pages 1--37, 2015.Google ScholarGoogle Scholar
  10. M. Nagappan, T. Zimmermann, and C. Bird. Diversity in software engineering research. ESEC/FSE, pages 466--476, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. Padhye, S. Mani, and V. S. Sinha. A study of external community contribution to open-source projects on GitHub. In 11th Working Conference on Mining Software Repositories, pages 332--335, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. G. Robles. Replicating msr: A study of the potential replicability of papers published in the mining software repositories proceedings. MSR, pages 171--180, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  13. A. Serebrenik and T. Mens. Challenges in software ecosystems research. ECSAW, pages 40:1--40:6, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. F. Thung, T. F. Bissyande, D. Lo, and L. Jiang. Network Structure of Social Coding in GitHub. In 17th European Conference on Software Maintenance and Reengineering, pages 323--326, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. B. Vasilescu, V. Filkov, and A. Serebrenik. Stackoverflow and GitHub: associations between software development and crowdsourced knowledge. SocialCom, pages 188--195, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Xavier and A. Macedo. Understanding the popularity of reporters and assignees in the GitHub. In 26th International Conference on Software Engineering and Knowledge Engineering, pages 484--489, 2014.Google ScholarGoogle Scholar

Index Terms

  1. Findings from GitHub: methods, datasets and limitations

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        MSR '16: Proceedings of the 13th International Conference on Mining Software Repositories
        May 2016
        544 pages
        ISBN:9781450341868
        DOI:10.1145/2901739

        Copyright © 2016 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 14 May 2016

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Upcoming Conference

        ICSE 2025

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader