ABSTRACT
Giant monolithic source-code repositories are one of the fundamental pillars of the back end infrastructure in large and fast-paced software companies. The sheer volume of everyday code changes demands a reliable and efficient change management system with three uncompromisable key requirements --- always green master, high throughput, and low commit turnaround time. Green refers to a master branch that always successfully compiles and passes all build steps, the opposite being red. A broken master (red) leads to delayed feature rollouts because a faulty code commit needs to be detected and rolled backed. Additionally, a red master has a cascading effect that hampers developer productivity--- developers might face local test/build failures, or might end up working on a codebase that will eventually be rolled back.
This paper presents the design and implementation of SubmitQueue. It guarantees an always green master branch at scale: all build steps (e.g., compilation, unit tests, UI tests) successfully execute for every commit point. SubmitQueue has been in production for over a year, and can scale to thousands of daily commits to giant monolithic repositories.
- 2018. Bazel. https://bazel.build/.Google Scholar
- 2018. Bors. https://github.com/graydon/bors.Google Scholar
- 2018. Buck. https://buckbuild.com/.Google Scholar
- 2018. Commit Queue. https://dev.chromium.org/developers/tree-sheriffs/sheriff-details-chromium-os/commit-queue-overview.Google Scholar
- 2018. Cycle.js. https://cycle.js.org/.Google Scholar
- 2018. Dropwizard. https://www.dropwizard.io.Google Scholar
- 2018. Git-bisect. https://git-scm.com/docs/git-bisect.Google Scholar
- 2018. NullAway. https://github.com/uber/NullAway.Google Scholar
- 2018. Rust-lang. https://www.rust-lang.org.Google Scholar
- 2018. Scikit. http://scikit-learn.org/stable/.Google Scholar
- 2018. ThreadSanitizer. https://clang.llvm.org/docs/ThreadSanitizer.html.Google Scholar
- 2018. Zuul. https://zuul-ci.org/.Google Scholar
- Atul Adya, Robert Gruber, Barbara Liskov, and Umesh Maheshwari. 1995. Efficient Optimistic Concurrency Control Using Loosely Synchronized Clocks. In International Conference on the Management of Data (SIGMOD). 23--34. Google ScholarDigital Library
- Ranjita Bhagwan, Rahul Kumar, Chandra Sekhar Maddila, and Adithya Abraham Philip. 2018. Orca: Differential Bug Localization in Large-Scale Services. In Symposium on Operating Systems Design and Implementation (OSDI). 493--509. Google ScholarDigital Library
- Jacob T. Biehl, Mary Czerwinski, Mary Czerwinski, Greg Smith, and George G. Robertson. 2007. FASTDash: A Visual Dashboard for Fostering Awareness in Software Teams. In Conference on Human Factors in Computing Systems (CHI). 1313--1322. Google ScholarDigital Library
- Yuriy Brun, Reid Holmes, Michael D. Ernst, and David Notkin. 2011. Proactive Detection of Collaboration Conflicts. In Symposium on the Foundations of Software Engineering (FSE) and European Software Engineering Conference (ESEC). 168--178. Google ScholarDigital Library
- Ahmet Celik, Marko Vasic, Aleksandar Milicevic, and Milos Gligoric. 2017. Regression Test Selection Across JVM Boundaries. In Joint Meeting on Foundations of Software Engineering (ESEC/FSE). 809--820. Google ScholarDigital Library
- Trishul M. Chilimbi, Ben Liblit, Krishna Mehra, Aditya V. Nori, and Kapil Vaswani. 2009. HOLMES: Effective Statistical Debugging via Efficient Path Profiling. In International Conference on Software Engineering (ICSE). 34--44. Google ScholarDigital Library
- James C Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, J J Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, and Dale Woodford. 2012. Spanner: Google's Globally-Distributed Database. In Symposium on Operating Systems Design and Implementation (OSDI). 251--264. Google ScholarDigital Library
- Cleidson R. B. de Souza, David F. Redmiles, and Paul Dourish. 2003. "Breaking the code", moving between private and public work in collaborative software development. In International Conference on Supporting Group Work (GROUP). 105--114. Google ScholarDigital Library
- Prasun Dewan and Rajesh Hegde. 2007. European Conference on Computer Supported Cooperative Work (ECSCW). 159--178.Google Scholar
- Dawson Engler and Ken Ashcraft. 2003. RacerX: Effective, Static Detection of Race Conditions and Deadlocks. In Symposium on Operating Systems Principles (SOSP). 237--252. Google ScholarDigital Library
- Milos Gligoric, Lamyaa Eloussi, and Darko Marinov. 2015. Practical Regression Test Selection with Dynamic File Dependencies. In International Symposium on Software Testing and Analysis (ISSTA). 211--222. Google ScholarDigital Library
- Mário Luís Guimarães and António Rito Silva. 2012. Improving Early Detection of Software Merge Conflicts. In International Conference on Software Engineering (ICSE). 342--352. Google ScholarDigital Library
- Isabelle Guyon, Jason Weston, Stephen Barnhill, and Vladimir Vapnik. 2002. Gene Selection for Cancer Classification using Support Vector Machines. Machine Learning 46, 1 (01 Jan 2002), 389--422. Google ScholarDigital Library
- Jeff Huang, Patrick O'Neil Meredith, and Grigore Rosu. 2014. Maximal Sound Predictive Race Detection with Control Flow Abstraction. In Conference on Programming Languages Design and Implementation (PLDI). 337--348. Google ScholarDigital Library
- Ciera Jaspan, Matthew Jorde, Andrea Knight, Caitlin Sadowski, Edward K. Smith, Collin Winter, and Emerson Murphy-Hill. 2018. Advantages and Disadvantages of a Monolithic Repository: A Case Study at Google. In International Conference on Software Engineering (ICSE). 225--234. Google ScholarDigital Library
- Y. Kamei, E. Shihab, B. Adams, A. E. Hassan, A. Mockus, A. Sinha, and N. Ubayashi. 2013. A large-scale empirical study of just-in-time quality assurance. IEEE Transactions on Software Engineering 39, 6 (2013), 757--773. Google ScholarDigital Library
- Sunghun Kim, E. James Whitehead, Jr., and Yi Zhang. 2008. Classifying Software Changes: Clean or Buggy? IEEE Transactions on Software Engineering 34, 2 (2008), 181--196. Google ScholarDigital Library
- S. Kim, T. Zimmermann, K. Pan, and E. J. Jr. Whitehead. 2006. Automatic Identification of Bug-Introducing Changes. In International Conference on Automated Software Engineering (ASE). 81--90. Google ScholarDigital Library
- Ben Liblit, Mayur Naik, Alice X. Zheng, Alex Aiken, and Michael I. Jordan. 2005. Scalable Statistical Bug Isolation. In Conference on Programming Languages Design and Implementation (PLDI). 15--26. Google ScholarDigital Library
- Mateusz Machalica, Alex Samylkin, Meredith Porth, and Satish Chandra. 2018. Predictive Test Selection. Computing Research Repository (CoRR) abs/1810.05286 (2018). arXiv:1810.05286 http://arxiv.org/abs/1810.05286Google Scholar
- Atif Memon, Zebao Gao, Bao Nguyen, Sanjeev Dhanda, Eric Nickell, Rob Siemborski, and John Micco. 2017. Taming Google-scale Continuous Testing. In International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP). 233--242. Google ScholarDigital Library
- Shuai Mu, Yang Cui, Yang Zhang, Wyatt Lloyd, and Jinyang Li. 2014. Extracting More Concurrency from Distributed Transactions. In Symposium on Operating Systems Design and Implementation (OSDI). 479--494. Google ScholarDigital Library
- Mayur Naik, Alex Aiken, and John Whaley. 2006. Effective Static Race Detection for Java. In Conference on Programming Languages Design and Implementation (PLDI). 308--319. Google ScholarDigital Library
- Daniel Peng and Frank Dabek. 2010. Large-scale incremental processing using distributed transactions and notifications. In Symposium on Operating Systems Design and Implementation (OSDI). 251--264. Google ScholarDigital Library
- Dewayne E. Perry, Harvey P. Siy, and Lawrence G. Votta. 2001. Parallel Changes in Large-scale Software Development: An Observational Case Study. ACM Transactions on Software Engineering and Methodology 10, 3 (July 2001), 308--337. Google ScholarDigital Library
- Rachel Potvin and Josh Levenberg. 2016. Why Google Stores Billions of Lines of Code in a Single Repository. Commun. ACM 59 (2016), 78--87. Google ScholarDigital Library
- Gregg Rothermel and Mary Jean Harrold. 1997. A Safe, Efficient Regression Test Selection Technique. ACM Transactions on Software Engineering and Methodology 6, 2 (April 1997), 173--210. Google ScholarDigital Library
- Barbara G. Ryder and Frank Tip. 2001. Change Impact Analysis for Object-oriented Programs. In Workshop on Program Analysis for Software Tools and Engineering (PASTE). 46--53. Google ScholarDigital Library
- Anita Sarma, Gerald Bortis, and Andre van der Hoek. 2007. Towards Supporting Awareness of Indirect Conflicts Across Software Configuration Management Workspaces. In International Conference on Automated Software Engineering (ASE). 94--103. Google ScholarDigital Library
- Chunqiang Tang, Thawan Kooburat, Pradeep Venkatachalam, Akshay Chander, Zhe Wen, Aravind Narayanan, Patrick Dowell, and Robert Karl. 2015. Holistic Configuration Management at Facebook. In Symposium on Operating Systems Principles (SOSP). 328--343. Google ScholarDigital Library
- Alexander Thomson, Thaddeus Diamond, Philip Shao, and Daniel J. Abadi. 2012. Calvin: Fast Distributed Transactions for Partitioned Database Systems. In International Conference on the Management of Data (SIGMOD). 1--12. Google ScholarDigital Library
- Joseph Tucek, Shan Lu, Chengdu Huang, Spiros Xanthos, and Yuanyuan Zhou. 2007. Triage: Diagnosing Production Run Failures at the User's Site. In Symposium on Operating Systems Principles (SOSP). 131--144. Google ScholarDigital Library
- X. Yang, D. Lo, X. Xia, Y. Zhang, and J. Sun. 2015. Deep Learning for Just-in-Time Defect Prediction. In International Conference on Software Quality, Reliability and Security (QRS). 17--26. Google ScholarDigital Library
- Lingming Zhang. 2018. Hybrid Regression Test Selection. In International Conference on Software Engineering (ICSE). 199--209. Google ScholarDigital Library
- Celal Ziftci and Jim Reardon. 2017. Who Broke the Build?: Automatically Identifying Changes That Induce Test Failures in Continuous Integration at Google Scale. In International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP). 113--122. Google ScholarDigital Library
Comments