skip to main content
10.1145/3302424.3303970acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article
Open Access

Keeping Master Green at Scale

Published:25 March 2019Publication History

ABSTRACT

Giant monolithic source-code repositories are one of the fundamental pillars of the back end infrastructure in large and fast-paced software companies. The sheer volume of everyday code changes demands a reliable and efficient change management system with three uncompromisable key requirements --- always green master, high throughput, and low commit turnaround time. Green refers to a master branch that always successfully compiles and passes all build steps, the opposite being red. A broken master (red) leads to delayed feature rollouts because a faulty code commit needs to be detected and rolled backed. Additionally, a red master has a cascading effect that hampers developer productivity--- developers might face local test/build failures, or might end up working on a codebase that will eventually be rolled back.

This paper presents the design and implementation of SubmitQueue. It guarantees an always green master branch at scale: all build steps (e.g., compilation, unit tests, UI tests) successfully execute for every commit point. SubmitQueue has been in production for over a year, and can scale to thousands of daily commits to giant monolithic repositories.

References

  1. 2018. Bazel. https://bazel.build/.Google ScholarGoogle Scholar
  2. 2018. Bors. https://github.com/graydon/bors.Google ScholarGoogle Scholar
  3. 2018. Buck. https://buckbuild.com/.Google ScholarGoogle Scholar
  4. 2018. Commit Queue. https://dev.chromium.org/developers/tree-sheriffs/sheriff-details-chromium-os/commit-queue-overview.Google ScholarGoogle Scholar
  5. 2018. Cycle.js. https://cycle.js.org/.Google ScholarGoogle Scholar
  6. 2018. Dropwizard. https://www.dropwizard.io.Google ScholarGoogle Scholar
  7. 2018. Git-bisect. https://git-scm.com/docs/git-bisect.Google ScholarGoogle Scholar
  8. 2018. NullAway. https://github.com/uber/NullAway.Google ScholarGoogle Scholar
  9. 2018. Rust-lang. https://www.rust-lang.org.Google ScholarGoogle Scholar
  10. 2018. Scikit. http://scikit-learn.org/stable/.Google ScholarGoogle Scholar
  11. 2018. ThreadSanitizer. https://clang.llvm.org/docs/ThreadSanitizer.html.Google ScholarGoogle Scholar
  12. 2018. Zuul. https://zuul-ci.org/.Google ScholarGoogle Scholar
  13. Atul Adya, Robert Gruber, Barbara Liskov, and Umesh Maheshwari. 1995. Efficient Optimistic Concurrency Control Using Loosely Synchronized Clocks. In International Conference on the Management of Data (SIGMOD). 23--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Ranjita Bhagwan, Rahul Kumar, Chandra Sekhar Maddila, and Adithya Abraham Philip. 2018. Orca: Differential Bug Localization in Large-Scale Services. In Symposium on Operating Systems Design and Implementation (OSDI). 493--509. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jacob T. Biehl, Mary Czerwinski, Mary Czerwinski, Greg Smith, and George G. Robertson. 2007. FASTDash: A Visual Dashboard for Fostering Awareness in Software Teams. In Conference on Human Factors in Computing Systems (CHI). 1313--1322. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Yuriy Brun, Reid Holmes, Michael D. Ernst, and David Notkin. 2011. Proactive Detection of Collaboration Conflicts. In Symposium on the Foundations of Software Engineering (FSE) and European Software Engineering Conference (ESEC). 168--178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ahmet Celik, Marko Vasic, Aleksandar Milicevic, and Milos Gligoric. 2017. Regression Test Selection Across JVM Boundaries. In Joint Meeting on Foundations of Software Engineering (ESEC/FSE). 809--820. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Trishul M. Chilimbi, Ben Liblit, Krishna Mehra, Aditya V. Nori, and Kapil Vaswani. 2009. HOLMES: Effective Statistical Debugging via Efficient Path Profiling. In International Conference on Software Engineering (ICSE). 34--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. James C Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, J J Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, and Dale Woodford. 2012. Spanner: Google's Globally-Distributed Database. In Symposium on Operating Systems Design and Implementation (OSDI). 251--264. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Cleidson R. B. de Souza, David F. Redmiles, and Paul Dourish. 2003. "Breaking the code", moving between private and public work in collaborative software development. In International Conference on Supporting Group Work (GROUP). 105--114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Prasun Dewan and Rajesh Hegde. 2007. European Conference on Computer Supported Cooperative Work (ECSCW). 159--178.Google ScholarGoogle Scholar
  22. Dawson Engler and Ken Ashcraft. 2003. RacerX: Effective, Static Detection of Race Conditions and Deadlocks. In Symposium on Operating Systems Principles (SOSP). 237--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Milos Gligoric, Lamyaa Eloussi, and Darko Marinov. 2015. Practical Regression Test Selection with Dynamic File Dependencies. In International Symposium on Software Testing and Analysis (ISSTA). 211--222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Mário Luís Guimarães and António Rito Silva. 2012. Improving Early Detection of Software Merge Conflicts. In International Conference on Software Engineering (ICSE). 342--352. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Isabelle Guyon, Jason Weston, Stephen Barnhill, and Vladimir Vapnik. 2002. Gene Selection for Cancer Classification using Support Vector Machines. Machine Learning 46, 1 (01 Jan 2002), 389--422. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Jeff Huang, Patrick O'Neil Meredith, and Grigore Rosu. 2014. Maximal Sound Predictive Race Detection with Control Flow Abstraction. In Conference on Programming Languages Design and Implementation (PLDI). 337--348. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Ciera Jaspan, Matthew Jorde, Andrea Knight, Caitlin Sadowski, Edward K. Smith, Collin Winter, and Emerson Murphy-Hill. 2018. Advantages and Disadvantages of a Monolithic Repository: A Case Study at Google. In International Conference on Software Engineering (ICSE). 225--234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Y. Kamei, E. Shihab, B. Adams, A. E. Hassan, A. Mockus, A. Sinha, and N. Ubayashi. 2013. A large-scale empirical study of just-in-time quality assurance. IEEE Transactions on Software Engineering 39, 6 (2013), 757--773. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Sunghun Kim, E. James Whitehead, Jr., and Yi Zhang. 2008. Classifying Software Changes: Clean or Buggy? IEEE Transactions on Software Engineering 34, 2 (2008), 181--196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. S. Kim, T. Zimmermann, K. Pan, and E. J. Jr. Whitehead. 2006. Automatic Identification of Bug-Introducing Changes. In International Conference on Automated Software Engineering (ASE). 81--90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Ben Liblit, Mayur Naik, Alice X. Zheng, Alex Aiken, and Michael I. Jordan. 2005. Scalable Statistical Bug Isolation. In Conference on Programming Languages Design and Implementation (PLDI). 15--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Mateusz Machalica, Alex Samylkin, Meredith Porth, and Satish Chandra. 2018. Predictive Test Selection. Computing Research Repository (CoRR) abs/1810.05286 (2018). arXiv:1810.05286 http://arxiv.org/abs/1810.05286Google ScholarGoogle Scholar
  33. Atif Memon, Zebao Gao, Bao Nguyen, Sanjeev Dhanda, Eric Nickell, Rob Siemborski, and John Micco. 2017. Taming Google-scale Continuous Testing. In International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP). 233--242. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Shuai Mu, Yang Cui, Yang Zhang, Wyatt Lloyd, and Jinyang Li. 2014. Extracting More Concurrency from Distributed Transactions. In Symposium on Operating Systems Design and Implementation (OSDI). 479--494. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Mayur Naik, Alex Aiken, and John Whaley. 2006. Effective Static Race Detection for Java. In Conference on Programming Languages Design and Implementation (PLDI). 308--319. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Daniel Peng and Frank Dabek. 2010. Large-scale incremental processing using distributed transactions and notifications. In Symposium on Operating Systems Design and Implementation (OSDI). 251--264. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Dewayne E. Perry, Harvey P. Siy, and Lawrence G. Votta. 2001. Parallel Changes in Large-scale Software Development: An Observational Case Study. ACM Transactions on Software Engineering and Methodology 10, 3 (July 2001), 308--337. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Rachel Potvin and Josh Levenberg. 2016. Why Google Stores Billions of Lines of Code in a Single Repository. Commun. ACM 59 (2016), 78--87. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Gregg Rothermel and Mary Jean Harrold. 1997. A Safe, Efficient Regression Test Selection Technique. ACM Transactions on Software Engineering and Methodology 6, 2 (April 1997), 173--210. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Barbara G. Ryder and Frank Tip. 2001. Change Impact Analysis for Object-oriented Programs. In Workshop on Program Analysis for Software Tools and Engineering (PASTE). 46--53. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Anita Sarma, Gerald Bortis, and Andre van der Hoek. 2007. Towards Supporting Awareness of Indirect Conflicts Across Software Configuration Management Workspaces. In International Conference on Automated Software Engineering (ASE). 94--103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Chunqiang Tang, Thawan Kooburat, Pradeep Venkatachalam, Akshay Chander, Zhe Wen, Aravind Narayanan, Patrick Dowell, and Robert Karl. 2015. Holistic Configuration Management at Facebook. In Symposium on Operating Systems Principles (SOSP). 328--343. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Alexander Thomson, Thaddeus Diamond, Philip Shao, and Daniel J. Abadi. 2012. Calvin: Fast Distributed Transactions for Partitioned Database Systems. In International Conference on the Management of Data (SIGMOD). 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Joseph Tucek, Shan Lu, Chengdu Huang, Spiros Xanthos, and Yuanyuan Zhou. 2007. Triage: Diagnosing Production Run Failures at the User's Site. In Symposium on Operating Systems Principles (SOSP). 131--144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. X. Yang, D. Lo, X. Xia, Y. Zhang, and J. Sun. 2015. Deep Learning for Just-in-Time Defect Prediction. In International Conference on Software Quality, Reliability and Security (QRS). 17--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Lingming Zhang. 2018. Hybrid Regression Test Selection. In International Conference on Software Engineering (ICSE). 199--209. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Celal Ziftci and Jim Reardon. 2017. Who Broke the Build?: Automatically Identifying Changes That Induce Test Failures in Continuous Integration at Google Scale. In International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP). 113--122. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    EuroSys '19: Proceedings of the Fourteenth EuroSys Conference 2019
    March 2019
    714 pages
    ISBN:9781450362818
    DOI:10.1145/3302424

    Copyright © 2019 Owner/Author

    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 25 March 2019

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate241of1,308submissions,18%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader