Abstract
Floating-point arithmetic is considered as esoteric subject by many people. This is rather surprising, because floating-point is ubiquitous in computer systems: Almost every language has a floating-point datatype; computers from PCs to supercomputers have floating-point accelerators; most compilers will be called upon to compile floating-point algorithms from time to time; and virtually every operating system must respond to floating-point exceptions such as overflow. This paper presents a tutorial on the aspects of floating-point that have a direct impact on designers of computer systems. It begins with background on floating-point representation and rounding error, continues with a discussion of the IEEE floating point standard, and concludes with examples of how computer system builders can better support floating point.
- AHO, A. V., SETHI, R., AND ULLMAN, J. D. 1986. Compilers: Principles, Techniques and Tools. Addison-Wesley, Reading, Mass. Google Scholar
- ANSI 1978. American National Standard Programming Language FORTRAN, ANSI Standard X3.9-1978. American National Standards Institute, New York.Google Scholar
- BARNETT, D. 1987. A portable floating-point environment. Unpublished manuscript.Google Scholar
- BROWN, W. S. 1981. A simple but realistic model of floating-point computation. ACM Trans. Math. Softw. 7, 4, 445-480. Google Scholar
- CARDELLI, L., DONAHUE, J., GLASSMAN, L., JORDAN, M., KASLOW, B., AND NELSON, G. 1989. Modula-3 Report (revised). Digital Systems Research Center Report *~52, Palo Alto, Calif.Google Scholar
- CODY, W. J. et al. 1984. A proposed radix- and word-length-independent standard for floatingpoint arithmetic. IEEE Micro 4, 4, 86-100.Google Scholar
- CODY, W. J. 1988. Floating-point standards--Theory and practice. In Reliability in Computing: The Role of lnterval Methods on Scientific Computing, Ramon E. Moore, Ed. Academic Press, Boston, Mass., pp. 99-107. Google Scholar
- COONEN, J. 1984. Contributions to a proposed standard for binary floating-point arithmetic. PhD dissertation, Univ. of California, Berkeley. Google Scholar
- DEKKER, T. J. 1971. A floating-point technique for extending the available precision. Numer. Math. 18, 3, 224-242.Google Scholar
- DEMMEL, J. 1984. Underflow and the reliability of numerical software. SIAM J. Sci. Stat. Cornput. 5, 4, 887-919.Google Scholar
- FARNUM, C. 1988. Compiler support for floatingpoint computation. Softw. Pract. Experi. 18, 7, 701-709. Google Scholar
- FORSYTHE, G. E., AND MOLER, C. B. 1967. Computer Solutmn of Linear Algebraic Systems. Prentice-Hall, Englewood Cliffs, N.J.Google Scholar
- GOLDBERG, I. B. 1967. 27 Bits are not enough for 8-digit accuracy. Commum. ACM 10, 2, 105-106. Google Scholar
- GOLDBERC, D. 1990. Computer arithmetic. In Computer Architecture: A Quantitative Approach, David Patterson and John L. Hennessy, Eds. Morgan Kaufmann, Los Altos, Calif., Appendix A.Google Scholar
- GOLUB, G. H., AND VAN LOAN, C. F. 1989. Matrix Computations. The Johns Hopkins University Press, Baltimore, MD.Google Scholar
- HEWLETT PACKARD 1982. HP-15C Advanced Functions Handbook.Google Scholar
- IEEE 1987. IEEE Standard 754-1985 for Binary Floating-Point Arithmetic, IEEE. Reprinted in SIGPLAN 22, 2, 9-25.Google Scholar
- KASAN, W. 1972. A Survey of Error Analysis. In Information Processing 71, (Ljubljana, Yugoslavia), North Holland, Amsterdam, vol. 2, pp. 1214-1239.Google Scholar
- KAHAN, W. 1986. Calculating Area and Angle of a Needle-like Triangle. Unpublished manuscript.Google Scholar
- KAHAN, W. 1987. Branch cuts for complex elementary functions. In The State of the Art in Numerical Analyszs, M. J. D. Powell and A Iserles, Eds., Oxford University Press, N.Y., Chap. 7.Google Scholar
- KAnAN, W. 1988. Unpublished lectures given at Sun Microsystems, Mountain View, Calif.Google Scholar
- KAHAN, W., AND COONEN, J. T. 1982. The near orthogonality of syntax, semantics, and diagnostics in numerical programming environments. In The Relationship between Numerical Computation and Programmi,g Languages, J. K. Reid~ Ed. North-Holland~ Amsterdam~ pp 103 115.Google Scholar
- KAI~AN, W., AND LEBLANC, E. 1985. Anomalies in the IBM acrith package. In Proceedings of the 7th IEEE Symposium on Computer Arithmetic (Urbana, Ill.), pp. 322-331.Google Scholar
- KERXIGHAN, B. W., AND RITCHm, D. M. 1978. The C Programming Language. Prentice~Hall, Englewood Cliffs, N.J. Google Scholar
- KIRCHNER, R., AND KULISCH, U 1987 Arithmetic for vector processors. In Proceedings of the 8th IEEE Symposium on Computer Arithmetic (Como, Italy), pp. 256-269.Google Scholar
- KNUT~, D. E. 1981. The Art of Computer Programming Addison-Wesley, Reading, Mass., vol. II, 2nd ed. Google Scholar
- KULISH, U. W., ArCD MmANKER W. L. 1986. The Arithmetic of the Digital Computer: A new approach. SIAM Rev 28, 1, 1-36. Google Scholar
- MATULA, D. W., AND KORNERUP, P. 1985. Finite Precision Rational Arithmetic: Slash Number Systems. IEEE Trans. Comput. C-34, 1, 3-18.Google Scholar
- REISER, J. F., A2CD KNUTH, D E. 1975. Evading the drift in floating-point addition. Inf. Process. Lett 3, 3, 84-87Google Scholar
- STERBETZ, P. H. 1974. Floating-Point Computation. Prentice-Hall, Englewood Cliffs, N.J.Google Scholar
- SWARTZLANDER, E. E , AND ALEXOPOULOS, G. 1975. The sign/logarithm number system. IEEE Trans. Comput. C-24, 12, 1238-1242Google Scholar
- WALTHER, J. S. 1971. A unified algorithm for elementary functions. Proceedings of the AFIP Spr~ng Joint Computer Conference, pp. 379- 385.Google Scholar
Index Terms
- What every computer scientist should know about floating-point arithmetic
Recommendations
A Combined Decimal and Binary Floating-Point Multiplier
ASAP '09: Proceedings of the 2009 20th IEEE International Conference on Application-specific Systems, Architectures and ProcessorsIn this paper, we describe the first hardware design of a combined binary and decimal floating-point multiplier, based on specifications in the IEEE 754-2008 Floating-point Standard. The multiplier design operates on either (1) 64-bit binary encoded ...
A Decimal Floating-Point Divider Using Newton---Raphson Iteration
Increasing chip densities and transistor counts provide more room for designers to add functionality for important application domains into future microprocessors. As a result of rapid growth in financial, commercial, and Internet-based applications, ...
Options for Denormal Representation in Logarithmic Arithmetic
AbstractEconomical hardware often uses a FiXed-point Number System (FXNS), whose constant absolute precision is acceptable for many signal-processing algorithms. The almost-constant relative precision of the more expensive Floating-Point (FP) number ...
Comments