skip to main content
research-article
Free Access

Mars code

Published:01 February 2014Publication History
Skip Abstract Section

Abstract

Redundant software (and hardware) ensured Curiosity reached its destination and functioned as its designers intended.

References

  1. Chalin, P. Ensuring Continued Mainstream Use of Formal Methods: An Assessment. Roadmap and Issues Group, D.S.R., TR 2005-001, Concordia University, Montréal, Canada, 2005.Google ScholarGoogle Scholar
  2. Detlefs, D.L., Flood, C.H., Garthwaite, A.T. et al. Even better DCAS-based concurrent deques. In Distributed Algorithms, LNCS Vol. 1914, M. Herlihy, Ed. Springer Verlag, Heidelberg, 2000, 59--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Doherty, S. Modelling and Verifying Non-blocking Algorithms that Use Dynamically Allocated Memory. Master's Thesis, Victoria University, Wellington, New Zealand, 2004.Google ScholarGoogle Scholar
  4. Doherty, S., Detlefs, D.L., Groves, L. et al. DCAS is not a silver bullet for nonblocking algorithm design. In Proceedings of the 16th Annual ACM Symposium on Parallelism in Algorithms and Architectures, P.B. Gibbons and M. Adler, Eds. (Barcelona, Spain, June 27--30). ACM Press, New York, 2004, 216--224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Gluck, P.R. and Holzmann, G.J. Using Spin model checking for flight software verification. In Proceedings of the 2002 Aerospace Conference (Big Sky, MT, Mar. 9--16). IEEE Press, Piscataway, NJ, 2002.Google ScholarGoogle Scholar
  6. Havelund, K., Lowry, M., Park, S. et al. Formal analysis of the remote agent: Before and after flight. IEEE Transactions on Software Engineering 27, 8 (Aug. 2001), 749--765. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Hoare, C.A.R. Assertions: A personal perspective. IEEE Annals of the History of Computing 25, 2 (Apr.-June 2003), 14--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Holzmann, G.J. Scrub: A tool for code reviews. Innovations in Systems and Software Engineering 6, 4 (Dec. 2010), 311--318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Holzmann, G.J. The power of ten: Rules for developing safety critical code. IEEE Computer 39, 6 (June 2006), 95--97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Holzmann, G.J. The Spin Model Checker: Primer and Reference Manual. Addison-Wesley, Boston, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Holzmann, G.J. and Joshi, R. Model-driven software verification. In Proceedings of the 11th Spin Workshop, LNCS 2989 (Barcelona, Spain, Apr. 1--3). Springer Verlag, Berlin, 2004, 76--91.Google ScholarGoogle Scholar
  12. Holzmann, G.J. and Smith, M.H. Automating software feature verification. Bell Labs Technical Journal 5, 2 (Apr.-June 2000), 7--87.Google ScholarGoogle Scholar
  13. Jet Propulsion Laboratory. JPL Coding Standard for Flight Software; http://lars-lab.jpl.nasa.gov/JPL_Coding_Standard_C.pdfGoogle ScholarGoogle Scholar
  14. Kudrjavets, G., Nagappan, N., and Ball, T. Assessing the relationship between software assertions and faults: An empirical investigation. In Proceedings of the IEEE International Symposium on Software Reliability Engineering (Raleigh, NC, Nov. 7--10). IEEE Press, Piscataway, NJ, 2006, 204--212. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Lamport, L. Checking a multithreaded algorithm with +CAL. In Proceedings of Distributed Computing: 20th International Conference (Stockholm, Sweden, Sept. 18--20). Springer-Verlag, Berlin, 2006, 151--163. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Motor Industry Software Reliability Association. MISRA-C Guidelines for the Use of the C Language in Critical Systems. MIRA Ltd., Warwickshire, U.K., 2012; http://www.misra-c.com/Google ScholarGoogle Scholar
  17. NASA. NASA Engineering and Safety Center, Technical Assessment Report. National Highway Traffic Safety Administration (NHTSA), Toyota Unintended Acceleration Investigation, Appendix A: Software, Washington, D.C., Jan. 18, 2011; http://www.nhtsa.gov/staticfiles/nvs/pdf/NASA_FR_Appendix_A_Software.pdfGoogle ScholarGoogle Scholar
  18. Ong, E.C. and Leveson, N. Fault protection in a component-based spacecraft architecture. In Proceedings of the International Conference on Space Mission Challenges for Information Technology (Pasadena, CA, July 13--16). Jet Propulsion Laboratory, Pasadena, CA, 2003.Google ScholarGoogle Scholar
  19. Pnueli, A. The temporal logic of programs. In Proceedings of the 18th Annual Symposium on Foundations of Computer Science (Providence, RI, Oct. 31--Nov. 1). IEEE Computer Society, Washington, D.C., 1977, 46--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Redberg, R. and Holzmann, G.J. Reviewing Code Review. LaRS Report, Jet Propulsion Laboratory, Pasadena, CA, Nov. 2013.Google ScholarGoogle Scholar
  21. Schneider, F., Easterbrook, S.M., Callahan, J.R., and Holzmann, G.J. Validating requirements for fault-tolerant systems using model checking. In Proceedings of the International Conference on Requirements Engineering (Colorado Springs, CO, April 6--10). IEEE Computer Society, Washington, D.C., 1998, 4--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. SRI International, Computer Science Laboratory. The PVS Specification and Verification System; http://pvs.csl.sri.com/Google ScholarGoogle Scholar
  23. Vardi, M. and Wolper, P. An automata-theoretic approach to automatic program verification. In Proceedings of the First IEEE Symposium on Logic in Computer Science (Cambridge, MA, June 16--18). IEEE Computer Society, Washington, D.C., 1986, 332--344.Google ScholarGoogle Scholar

Index Terms

  1. Mars code

                        Recommendations

                        Reviews

                        Nathan Carlson

                        NASA is not the first organization that would be consulted when looking for software development best practices. One might assume that like other organizations responsible for mission-critical applications, NASA follows a more formal and therefore time-consuming approach to writing software. One might assume that as a result of this attention to detail, NASA software development processes have fallen behind other more agile organizations. This article disproves the old adage, "You can't teach an old dog new tricks," as illustrated by the innovative approach employed by NASA's Curiosity flight software team. The article describes a number of practices that will be of interest to organizations looking to enhance their software development process. These improvements, deriving from a desire to minimize risk without sacrificing timelines, will be of particular interest to anyone working in mission-critical or safety-critical application development. The Curiosity team accomplished these goals first by implementing a risk-based coding standard, with different levels of compliance that could be applied to subsystems based on safety and criticality. The code review process was then re-engineered to utilize multiple static code analysis tools and both reduce and focus personal interactions to streamline agreement on and the resolution of defects. The team also used logic-model checking tools to check for defects throughout the development of the software, to assist in finding complex concurrency-related issues. The summary sentence at the top of the article is misleading ("Redundant software (and hardware) ensured Curiosity reached its destination and functioned as its designers intended"). The primary focus of the article is on innovative software development practices that were applied to the development of the Curiosity flight software. While of value, the subject of software redundancy is only addressed briefly in the conclusion. Online Computing Reviews Service

                        Yishai A. Feldman

                        Despite a few spectacular failures, NASA is one of the top organizations worldwide in systems and software engineering. The Mars rover Opportunity recently celebrated 10 years of operation on Mars, and the newer Curiosity has been operating since August 2012, after a landing maneuver of unprecedented complexity. This is quite a feat, considering the fact that the nearest repair technician is over 30 million miles away, and the communications delay is about 15 minutes on average. How do they do it__?__ This article gives a rare glimpse into the process used in programming the Mars rovers. It focuses on three methods used to reduce the risk of failure. An important feature of all three methods is the use of automation. The first consists of a set of risk-related coding rules, based on actual risks observed in previous missions. For example, one rule requires the code to pass compilation and static analyzer checks without any warnings. Another requires the code to have a minimum assertion density of two percent. The assertions are enabled not only for testing, but also during the mission, when violations place the rover into a safe mode, during which failures can be diagnosed and fixed. All of the rules are checked automatically when the code is compiled. The second method consists of tool-based code review. Four different static-analysis tools are used, with an automated system to manage the interaction between developers and reviewers in response to tool warnings. In over 80 percent of the cases, developers agreed with the warnings and fixed the code without reviewer intervention. An interesting observation Holzmann makes is that there was surprisingly little overlap between the warnings issued by the four tools. The third and strongest method is the use of a model-checking tool for race conditions, one of the most difficult types of bugs to discover. This cannot be done for the whole code base, but was used extensively in the Curiosity mission to verify critical multithreaded algorithms. In some cases, the tool works directly from the C source code. For larger subsystems, input models have to be prepared manually in the language of the model checker; this can reduce the input size by over one order of magnitude. The challenges faced by NASA are obviously unique, and are very different from those involved in cloud-based mobile applications, to use a popular example. There are many papers and books describing methodologies for developing such systems. However, developing safety-critical systems for cars and aircraft, medical devices, and power grids, or financial systems that must be resilient to faults and deliberate attacks, requires reliability levels similar to those of NASA. For developers of such systems, articles such as Holzmann's should be required reading. Online Computing Reviews Service

                        Access critical reviews of Computing literature here

                        Become a reviewer for Computing Reviews.

                        Comments

                        Login options

                        Check if you have access through your login credentials or your institution to get full access on this article.

                        Sign in

                        Full Access

                        • Published in

                          cover image Communications of the ACM
                          Communications of the ACM  Volume 57, Issue 2
                          February 2014
                          103 pages
                          ISSN:0001-0782
                          EISSN:1557-7317
                          DOI:10.1145/2556647
                          • Editor:
                          • Moshe Y. Vardi
                          Issue’s Table of Contents

                          Copyright © 2014 ACM

                          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                          Publisher

                          Association for Computing Machinery

                          New York, NY, United States

                          Publication History

                          • Published: 1 February 2014

                          Permissions

                          Request permissions about this article.

                          Request Permissions

                          Check for updates

                          Qualifiers

                          • research-article
                          • Popular
                          • Refereed

                        PDF Format

                        View or Download as a PDF file.

                        PDFChinese translation

                        eReader

                        View online with eReader.

                        eReader

                        HTML Format

                        View this article in HTML Format .

                        View HTML Format