research-article

Free Access

Mars code

Author:
Gerard J. Holzmann

California Institute of Technology, Pasadena, CA

California Institute of Technology, Pasadena, CA
View Profile

Authors Info & Claims

Communications of the ACM Volume 57 Issue 2February 2014pp 64–73https://doi.org/10.1145/2560217.2560218

Published:01 February 2014Publication History

Communications of the ACM

Abstract

Redundant software (and hardware) ensured Curiosity reached its destination and functioned as its designers intended.

References

Chalin, P. Ensuring Continued Mainstream Use of Formal Methods: An Assessment. Roadmap and Issues Group, D.S.R., TR 2005-001, Concordia University, Montréal, Canada, 2005.Google Scholar
Detlefs, D.L., Flood, C.H., Garthwaite, A.T. et al. Even better DCAS-based concurrent deques. In Distributed Algorithms, LNCS Vol. 1914, M. Herlihy, Ed. Springer Verlag, Heidelberg, 2000, 59--73. Google ScholarDigital Library
Doherty, S. Modelling and Verifying Non-blocking Algorithms that Use Dynamically Allocated Memory. Master's Thesis, Victoria University, Wellington, New Zealand, 2004.Google Scholar
Doherty, S., Detlefs, D.L., Groves, L. et al. DCAS is not a silver bullet for nonblocking algorithm design. In Proceedings of the 16^th Annual ACM Symposium on Parallelism in Algorithms and Architectures, P.B. Gibbons and M. Adler, Eds. (Barcelona, Spain, June 27--30). ACM Press, New York, 2004, 216--224. Google ScholarDigital Library
Gluck, P.R. and Holzmann, G.J. Using Spin model checking for flight software verification. In Proceedings of the 2002 Aerospace Conference (Big Sky, MT, Mar. 9--16). IEEE Press, Piscataway, NJ, 2002.Google Scholar
Havelund, K., Lowry, M., Park, S. et al. Formal analysis of the remote agent: Before and after flight. IEEE Transactions on Software Engineering 27, 8 (Aug. 2001), 749--765. Google ScholarDigital Library
Hoare, C.A.R. Assertions: A personal perspective. IEEE Annals of the History of Computing 25, 2 (Apr.-June 2003), 14--25. Google ScholarDigital Library
Holzmann, G.J. Scrub: A tool for code reviews. Innovations in Systems and Software Engineering 6, 4 (Dec. 2010), 311--318. Google ScholarDigital Library
Holzmann, G.J. The power of ten: Rules for developing safety critical code. IEEE Computer 39, 6 (June 2006), 95--97. Google ScholarDigital Library
Holzmann, G.J. The Spin Model Checker: Primer and Reference Manual. Addison-Wesley, Boston, 2004. Google ScholarDigital Library
Holzmann, G.J. and Joshi, R. Model-driven software verification. In Proceedings of the 11^th Spin Workshop, LNCS 2989 (Barcelona, Spain, Apr. 1--3). Springer Verlag, Berlin, 2004, 76--91.Google Scholar
Holzmann, G.J. and Smith, M.H. Automating software feature verification. Bell Labs Technical Journal 5, 2 (Apr.-June 2000), 7--87.Google Scholar
Jet Propulsion Laboratory. JPL Coding Standard for Flight Software; http://lars-lab.jpl.nasa.gov/JPL_Coding_Standard_C.pdfGoogle Scholar
Kudrjavets, G., Nagappan, N., and Ball, T. Assessing the relationship between software assertions and faults: An empirical investigation. In Proceedings of the IEEE International Symposium on Software Reliability Engineering (Raleigh, NC, Nov. 7--10). IEEE Press, Piscataway, NJ, 2006, 204--212. Google ScholarDigital Library
Lamport, L. Checking a multithreaded algorithm with +CAL. In Proceedings of Distributed Computing: 20^th International Conference (Stockholm, Sweden, Sept. 18--20). Springer-Verlag, Berlin, 2006, 151--163. Google ScholarDigital Library
Motor Industry Software Reliability Association. MISRA-C Guidelines for the Use of the C Language in Critical Systems. MIRA Ltd., Warwickshire, U.K., 2012; http://www.misra-c.com/Google Scholar
NASA. NASA Engineering and Safety Center, Technical Assessment Report. National Highway Traffic Safety Administration (NHTSA), Toyota Unintended Acceleration Investigation, Appendix A: Software, Washington, D.C., Jan. 18, 2011; http://www.nhtsa.gov/staticfiles/nvs/pdf/NASA_FR_Appendix_A_Software.pdfGoogle Scholar
Ong, E.C. and Leveson, N. Fault protection in a component-based spacecraft architecture. In Proceedings of the International Conference on Space Mission Challenges for Information Technology (Pasadena, CA, July 13--16). Jet Propulsion Laboratory, Pasadena, CA, 2003.Google Scholar
Pnueli, A. The temporal logic of programs. In Proceedings of the 18^th Annual Symposium on Foundations of Computer Science (Providence, RI, Oct. 31--Nov. 1). IEEE Computer Society, Washington, D.C., 1977, 46--57. Google ScholarDigital Library
Redberg, R. and Holzmann, G.J. Reviewing Code Review. LaRS Report, Jet Propulsion Laboratory, Pasadena, CA, Nov. 2013.Google Scholar
Schneider, F., Easterbrook, S.M., Callahan, J.R., and Holzmann, G.J. Validating requirements for fault-tolerant systems using model checking. In Proceedings of the International Conference on Requirements Engineering (Colorado Springs, CO, April 6--10). IEEE Computer Society, Washington, D.C., 1998, 4--13. Google ScholarDigital Library
SRI International, Computer Science Laboratory. The PVS Specification and Verification System; http://pvs.csl.sri.com/Google Scholar
Vardi, M. and Wolper, P. An automata-theoretic approach to automatic program verification. In Proceedings of the First IEEE Symposium on Logic in Computer Science (Cambridge, MA, June 16--18). IEEE Computer Society, Washington, D.C., 1986, 332--344.Google Scholar

Index Terms

Recommendations

MARS: A Multiprocessor-Based Programmable Accelerator

MARS, short for microprogrammable accelerator for rapid simulations, is a multiprocessor-based hardware accelerator that canefficiently implement a wide range of computationally complex algorithms. In addition to accelerating many graph-related ...
Read More
Practical relay code design based on protograph codes

The low‐density parity‐check (LDPC) code design for three‐terminal (namely, source, relay, and destination) relay network while considering decode‐and‐forward protocol is studied. Numerous works have been done on LDPC relay code design with parity bits ...
Read More
SC-LDPC Code Design for Half-Duplex Relay Channels

This paper studies code design for the half-duplex relay channel when the transmissions take place over binary input additive white Gaussian noise (BIAWGN) channels. Using the decode-forward relay protocol, we design the relay code based on a spatially ...
Read More

Reviews

Reviewer: Nathan Carlson

NASA is not the first organization that would be consulted when looking for software development best practices. One might assume that like other organizations responsible for mission-critical applications, NASA follows a more formal and therefore time-consuming approach to writing software. One might assume that as a result of this attention to detail, NASA software development processes have fallen behind other more agile organizations. This article disproves the old adage, "You can't teach an old dog new tricks," as illustrated by the innovative approach employed by NASA's Curiosity flight software team. The article describes a number of practices that will be of interest to organizations looking to enhance their software development process. These improvements, deriving from a desire to minimize risk without sacrificing timelines, will be of particular interest to anyone working in mission-critical or safety-critical application development. The Curiosity team accomplished these goals first by implementing a risk-based coding standard, with different levels of compliance that could be applied to subsystems based on safety and criticality. The code review process was then re-engineered to utilize multiple static code analysis tools and both reduce and focus personal interactions to streamline agreement on and the resolution of defects. The team also used logic-model checking tools to check for defects throughout the development of the software, to assist in finding complex concurrency-related issues. The summary sentence at the top of the article is misleading ("Redundant software (and hardware) ensured Curiosity reached its destination and functioned as its designers intended"). The primary focus of the article is on innovative software development practices that were applied to the development of the Curiosity flight software. While of value, the subject of software redundancy is only addressed briefly in the conclusion. Online Computing Reviews Service

Reviewer: Yishai A. Feldman

Despite a few spectacular failures, NASA is one of the top organizations worldwide in systems and software engineering. The Mars rover Opportunity recently celebrated 10 years of operation on Mars, and the newer Curiosity has been operating since August 2012, after a landing maneuver of unprecedented complexity. This is quite a feat, considering the fact that the nearest repair technician is over 30 million miles away, and the communications delay is about 15 minutes on average. How do they do it__?__ This article gives a rare glimpse into the process used in programming the Mars rovers. It focuses on three methods used to reduce the risk of failure. An important feature of all three methods is the use of automation. The first consists of a set of risk-related coding rules, based on actual risks observed in previous missions. For example, one rule requires the code to pass compilation and static analyzer checks without any warnings. Another requires the code to have a minimum assertion density of two percent. The assertions are enabled not only for testing, but also during the mission, when violations place the rover into a safe mode, during which failures can be diagnosed and fixed. All of the rules are checked automatically when the code is compiled. The second method consists of tool-based code review. Four different static-analysis tools are used, with an automated system to manage the interaction between developers and reviewers in response to tool warnings. In over 80 percent of the cases, developers agreed with the warnings and fixed the code without reviewer intervention. An interesting observation Holzmann makes is that there was surprisingly little overlap between the warnings issued by the four tools. The third and strongest method is the use of a model-checking tool for race conditions, one of the most difficult types of bugs to discover. This cannot be done for the whole code base, but was used extensively in the Curiosity mission to verify critical multithreaded algorithms. In some cases, the tool works directly from the C source code. For larger subsystems, input models have to be prepared manually in the language of the model checker; this can reduce the input size by over one order of magnitude. The challenges faced by NASA are obviously unique, and are very different from those involved in cloud-based mobile applications, to use a popular example. There are many papers and books describing methodologies for developing such systems. However, developing safety-critical systems for cars and aircraft, medical devices, and power grids, or financial systems that must be resilient to faults and deliberate attacks, requires reliability levels similar to those of NASA. For developers of such systems, articles such as Holzmann's should be required reading. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Communications of the ACM Volume 57, Issue 2
February 2014
103 pages
ISSN:0001-0782
EISSN:1557-7317
DOI:10.1145/2556647
Editor:
Moshe Y. Vardi
Association for Computing Machinery, New York, NY
Issue’s Table of Contents
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 February 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
- Popular
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 65
  Total Citations
  View Citations
- 103,842
  Total Downloads
- Downloads (Last 12 months)232
- Downloads (Last 6 weeks)115
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF Chinese translation

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Mars code

Communications of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

MARS: A Multiprocessor-Based Programmable Accelerator

Practical relay code design based on protograph codes

SC-LDPC Code Design for Half-Duplex Relay Channels

Reviews

Access critical reviews of Computing literature here