research-article

Open Access

Keeping Master Green at Scale

Authors:
Sundaram Ananthanarayanan

Uber Technologies

Uber Technologies
View Profile

,
Masoud Saeida Ardekani

Uber Technologies

Uber Technologies
View Profile

,
Denis Haenikel

Uber Technologies

Uber Technologies
View Profile

,
Balaji Varadarajan

Uber Technologies

Uber Technologies
View Profile

,
Simon Soriano

Uber Technologies

Uber Technologies
View Profile

,
Dhaval Patel

Uber Technologies

Uber Technologies
View Profile

,
Ali-Reza Adl-Tabatabai

Uber Technologies

Uber Technologies
View Profile

EuroSys '19: Proceedings of the Fourteenth EuroSys Conference 2019March 2019Article No.: 29Pages 1–15https://doi.org/10.1145/3302424.3303970

Published:25 March 2019Publication History

EuroSys '19: Proceedings of the Fourteenth EuroSys Conference 2019

Pages 1–15

ABSTRACT

Giant monolithic source-code repositories are one of the fundamental pillars of the back end infrastructure in large and fast-paced software companies. The sheer volume of everyday code changes demands a reliable and efficient change management system with three uncompromisable key requirements --- always green master, high throughput, and low commit turnaround time. Green refers to a master branch that always successfully compiles and passes all build steps, the opposite being red. A broken master (red) leads to delayed feature rollouts because a faulty code commit needs to be detected and rolled backed. Additionally, a red master has a cascading effect that hampers developer productivity--- developers might face local test/build failures, or might end up working on a codebase that will eventually be rolled back.

This paper presents the design and implementation of SubmitQueue. It guarantees an always green master branch at scale: all build steps (e.g., compilation, unit tests, UI tests) successfully execute for every commit point. SubmitQueue has been in production for over a year, and can scale to thousands of daily commits to giant monolithic repositories.

References

2018. Bazel. https://bazel.build/.Google Scholar
2018. Bors. https://github.com/graydon/bors.Google Scholar
2018. Buck. https://buckbuild.com/.Google Scholar
2018. Commit Queue. https://dev.chromium.org/developers/tree-sheriffs/sheriff-details-chromium-os/commit-queue-overview.Google Scholar
2018. Cycle.js. https://cycle.js.org/.Google Scholar
2018. Dropwizard. https://www.dropwizard.io.Google Scholar
2018. Git-bisect. https://git-scm.com/docs/git-bisect.Google Scholar
2018. NullAway. https://github.com/uber/NullAway.Google Scholar
2018. Rust-lang. https://www.rust-lang.org.Google Scholar
2018. Scikit. http://scikit-learn.org/stable/.Google Scholar
2018. ThreadSanitizer. https://clang.llvm.org/docs/ThreadSanitizer.html.Google Scholar
2018. Zuul. https://zuul-ci.org/.Google Scholar
Atul Adya, Robert Gruber, Barbara Liskov, and Umesh Maheshwari. 1995. Efficient Optimistic Concurrency Control Using Loosely Synchronized Clocks. In International Conference on the Management of Data (SIGMOD). 23--34. Google ScholarDigital Library
Ranjita Bhagwan, Rahul Kumar, Chandra Sekhar Maddila, and Adithya Abraham Philip. 2018. Orca: Differential Bug Localization in Large-Scale Services. In Symposium on Operating Systems Design and Implementation (OSDI). 493--509. Google ScholarDigital Library
Jacob T. Biehl, Mary Czerwinski, Mary Czerwinski, Greg Smith, and George G. Robertson. 2007. FASTDash: A Visual Dashboard for Fostering Awareness in Software Teams. In Conference on Human Factors in Computing Systems (CHI). 1313--1322. Google ScholarDigital Library
Yuriy Brun, Reid Holmes, Michael D. Ernst, and David Notkin. 2011. Proactive Detection of Collaboration Conflicts. In Symposium on the Foundations of Software Engineering (FSE) and European Software Engineering Conference (ESEC). 168--178. Google ScholarDigital Library
Ahmet Celik, Marko Vasic, Aleksandar Milicevic, and Milos Gligoric. 2017. Regression Test Selection Across JVM Boundaries. In Joint Meeting on Foundations of Software Engineering (ESEC/FSE). 809--820. Google ScholarDigital Library
Trishul M. Chilimbi, Ben Liblit, Krishna Mehra, Aditya V. Nori, and Kapil Vaswani. 2009. HOLMES: Effective Statistical Debugging via Efficient Path Profiling. In International Conference on Software Engineering (ICSE). 34--44. Google ScholarDigital Library
James C Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, J J Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, and Dale Woodford. 2012. Spanner: Google's Globally-Distributed Database. In Symposium on Operating Systems Design and Implementation (OSDI). 251--264. Google ScholarDigital Library
Cleidson R. B. de Souza, David F. Redmiles, and Paul Dourish. 2003. "Breaking the code", moving between private and public work in collaborative software development. In International Conference on Supporting Group Work (GROUP). 105--114. Google ScholarDigital Library
Prasun Dewan and Rajesh Hegde. 2007. European Conference on Computer Supported Cooperative Work (ECSCW). 159--178.Google Scholar
Dawson Engler and Ken Ashcraft. 2003. RacerX: Effective, Static Detection of Race Conditions and Deadlocks. In Symposium on Operating Systems Principles (SOSP). 237--252. Google ScholarDigital Library
Milos Gligoric, Lamyaa Eloussi, and Darko Marinov. 2015. Practical Regression Test Selection with Dynamic File Dependencies. In International Symposium on Software Testing and Analysis (ISSTA). 211--222. Google ScholarDigital Library
Mário Luís Guimarães and António Rito Silva. 2012. Improving Early Detection of Software Merge Conflicts. In International Conference on Software Engineering (ICSE). 342--352. Google ScholarDigital Library
Isabelle Guyon, Jason Weston, Stephen Barnhill, and Vladimir Vapnik. 2002. Gene Selection for Cancer Classification using Support Vector Machines. Machine Learning 46, 1 (01 Jan 2002), 389--422. Google ScholarDigital Library
Jeff Huang, Patrick O'Neil Meredith, and Grigore Rosu. 2014. Maximal Sound Predictive Race Detection with Control Flow Abstraction. In Conference on Programming Languages Design and Implementation (PLDI). 337--348. Google ScholarDigital Library
Ciera Jaspan, Matthew Jorde, Andrea Knight, Caitlin Sadowski, Edward K. Smith, Collin Winter, and Emerson Murphy-Hill. 2018. Advantages and Disadvantages of a Monolithic Repository: A Case Study at Google. In International Conference on Software Engineering (ICSE). 225--234. Google ScholarDigital Library
Y. Kamei, E. Shihab, B. Adams, A. E. Hassan, A. Mockus, A. Sinha, and N. Ubayashi. 2013. A large-scale empirical study of just-in-time quality assurance. IEEE Transactions on Software Engineering 39, 6 (2013), 757--773. Google ScholarDigital Library
Sunghun Kim, E. James Whitehead, Jr., and Yi Zhang. 2008. Classifying Software Changes: Clean or Buggy? IEEE Transactions on Software Engineering 34, 2 (2008), 181--196. Google ScholarDigital Library
S. Kim, T. Zimmermann, K. Pan, and E. J. Jr. Whitehead. 2006. Automatic Identification of Bug-Introducing Changes. In International Conference on Automated Software Engineering (ASE). 81--90. Google ScholarDigital Library
Ben Liblit, Mayur Naik, Alice X. Zheng, Alex Aiken, and Michael I. Jordan. 2005. Scalable Statistical Bug Isolation. In Conference on Programming Languages Design and Implementation (PLDI). 15--26. Google ScholarDigital Library
Mateusz Machalica, Alex Samylkin, Meredith Porth, and Satish Chandra. 2018. Predictive Test Selection. Computing Research Repository (CoRR) abs/1810.05286 (2018). arXiv:1810.05286 http://arxiv.org/abs/1810.05286Google Scholar
Atif Memon, Zebao Gao, Bao Nguyen, Sanjeev Dhanda, Eric Nickell, Rob Siemborski, and John Micco. 2017. Taming Google-scale Continuous Testing. In International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP). 233--242. Google ScholarDigital Library
Shuai Mu, Yang Cui, Yang Zhang, Wyatt Lloyd, and Jinyang Li. 2014. Extracting More Concurrency from Distributed Transactions. In Symposium on Operating Systems Design and Implementation (OSDI). 479--494. Google ScholarDigital Library
Mayur Naik, Alex Aiken, and John Whaley. 2006. Effective Static Race Detection for Java. In Conference on Programming Languages Design and Implementation (PLDI). 308--319. Google ScholarDigital Library
Daniel Peng and Frank Dabek. 2010. Large-scale incremental processing using distributed transactions and notifications. In Symposium on Operating Systems Design and Implementation (OSDI). 251--264. Google ScholarDigital Library
Dewayne E. Perry, Harvey P. Siy, and Lawrence G. Votta. 2001. Parallel Changes in Large-scale Software Development: An Observational Case Study. ACM Transactions on Software Engineering and Methodology 10, 3 (July 2001), 308--337. Google ScholarDigital Library
Rachel Potvin and Josh Levenberg. 2016. Why Google Stores Billions of Lines of Code in a Single Repository. Commun. ACM 59 (2016), 78--87. Google ScholarDigital Library
Gregg Rothermel and Mary Jean Harrold. 1997. A Safe, Efficient Regression Test Selection Technique. ACM Transactions on Software Engineering and Methodology 6, 2 (April 1997), 173--210. Google ScholarDigital Library
Barbara G. Ryder and Frank Tip. 2001. Change Impact Analysis for Object-oriented Programs. In Workshop on Program Analysis for Software Tools and Engineering (PASTE). 46--53. Google ScholarDigital Library
Anita Sarma, Gerald Bortis, and Andre van der Hoek. 2007. Towards Supporting Awareness of Indirect Conflicts Across Software Configuration Management Workspaces. In International Conference on Automated Software Engineering (ASE). 94--103. Google ScholarDigital Library
Chunqiang Tang, Thawan Kooburat, Pradeep Venkatachalam, Akshay Chander, Zhe Wen, Aravind Narayanan, Patrick Dowell, and Robert Karl. 2015. Holistic Configuration Management at Facebook. In Symposium on Operating Systems Principles (SOSP). 328--343. Google ScholarDigital Library
Alexander Thomson, Thaddeus Diamond, Philip Shao, and Daniel J. Abadi. 2012. Calvin: Fast Distributed Transactions for Partitioned Database Systems. In International Conference on the Management of Data (SIGMOD). 1--12. Google ScholarDigital Library
Joseph Tucek, Shan Lu, Chengdu Huang, Spiros Xanthos, and Yuanyuan Zhou. 2007. Triage: Diagnosing Production Run Failures at the User's Site. In Symposium on Operating Systems Principles (SOSP). 131--144. Google ScholarDigital Library
X. Yang, D. Lo, X. Xia, Y. Zhang, and J. Sun. 2015. Deep Learning for Just-in-Time Defect Prediction. In International Conference on Software Quality, Reliability and Security (QRS). 17--26. Google ScholarDigital Library
Lingming Zhang. 2018. Hybrid Regression Test Selection. In International Conference on Software Engineering (ICSE). 199--209. Google ScholarDigital Library
Celal Ziftci and Jim Reardon. 2017. Who Broke the Build?: Automatically Identifying Changes That Induce Test Failures in Continuous Integration at Google Scale. In International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP). 113--122. Google ScholarDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

EuroSys '19: Proceedings of the Fourteenth EuroSys Conference 2019
March 2019
714 pages
ISBN:9781450362818
DOI:10.1145/3302424

Copyright © 2019 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 March 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate241of1,308submissions,18%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 30,118
  Total Downloads
- Downloads (Last 12 months)3,080
- Downloads (Last 6 weeks)355
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Keeping Master Green at Scale

EuroSys '19: Proceedings of the Fourteenth EuroSys Conference 2019

ABSTRACT

References

Cited By

Recommendations

The Fiverr Master Class: The Fiverr Secrets Of Six Power Sellers That Enable You To Work From Home -Fiverr, Make Money Online, Fiverr Ideas, Fiverr ... At Home, Fiverr SEO, Fiverr.com - Volume 1

Cryptocurrency Master: Everything You Need To Know About Cryptocurrency and Bitcoin Trading, Mining, Investing, Ethereum, ICOs, and the Blockchain

The douglas crockford javascript master class

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Keeping Master Green at Scale

EuroSys '19: Proceedings of the Fourteenth EuroSys Conference 2019

ABSTRACT

References

Cited By

Recommendations

The Fiverr Master Class: The Fiverr Secrets Of Six Power Sellers That Enable You To Work From Home -Fiverr, Make Money Online, Fiverr Ideas, Fiverr ... At Home, Fiverr SEO, Fiverr.com - Volume 1

Cryptocurrency Master: Everything You Need To Know About Cryptocurrency and Bitcoin Trading, Mining, Investing, Ethereum, ICOs, and the Blockchain

The douglas crockford javascript master class

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media