Floating-point arithmetic

In computing, floating-point arithmetic (FP) is arithmetic that represents subsets of real numbers using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. Numbers of this form are called floating-point numbers.^[1]^: 3^[2]^: 10 For example, 12.345 is a floating-point number in base ten with five digits of precision:

$12.345=\!\underbrace {12345} _{\text{significand}}\!\times \!\underbrace {10} _{\text{base}}\!\!\!\!\!\!\!\overbrace {{}^{-3}} ^{\text{exponent}}$

However, unlike 12.345, 12.3456 is not a floating-point number in base ten with five digits of precision—it needs six digits of precision; the nearest floating-point number with only five digits is 12.346. In practice, most floating-point systems use base two, though base ten (decimal floating point) is also common.

Floating-point arithmetic operations, such as addition and division, approximate the corresponding real number arithmetic operations by rounding any result that is not a floating-point number itself to a nearby floating-point number.^[1]^: 22^[2]^: 10 For example, in a floating-point arithmetic with five base-ten digits of precision, the sum 12.345 + 1.0001 = 13.3451 might be rounded to 13.345.

The term floating point refers to the fact that the number's radix point can "float" anywhere to the left, right, or between the significant digits of the number. This position is indicated by the exponent, so floating point can be considered a form of scientific notation.

A floating-point system can be used to represent, with a fixed number of digits, numbers of very different orders of magnitude — such as the number of meters between galaxies or between protons in an atom. For this reason, floating-point arithmetic is often used to allow very small and very large real numbers that require fast processing times. The result of this dynamic range is that the numbers that can be represented are not uniformly spaced; the difference between two consecutive representable numbers varies with their exponent.^[3]

Augmented version above showing both signs of representable values

Over the years, a variety of floating-point representations have been used in computers. In 1985, the IEEE 754 Standard for Floating-Point Arithmetic was established, and since the 1990s, the most commonly encountered representations are those defined by the IEEE.

The speed of floating-point operations, commonly measured in terms of FLOPS, is an important characteristic of a computer system, especially for applications that involve intensive mathematical calculations.

A floating-point unit (FPU, colloquially a math coprocessor) is a part of a computer system specially designed to carry out operations on floating-point numbers.

^ ^a ^b Cite error: The named reference Muller_2010 was invoked but never defined (see the help page).
^ ^a ^b Sterbenz, Pat H. (1974). Floating-Point Computation. Englewood Cliffs, NJ, United States: Prentice-Hall. ISBN 0-13-322495-3.
^ Cite error: The named reference Smith_1997 was invoked but never defined (see the help page).

[Muller_2010-1] Cite error: The named reference Muller_2010 was invoked but never defined (see the help page).

[sterbenz1974fpcomp-2] Sterbenz, Pat H. (1974). Floating-Point Computation. Englewood Cliffs, NJ, United States: Prentice-Hall. ISBN 0-13-322495-3.

[Smith_1997-3] Cite error: The named reference Smith_1997 was invoked but never defined (see the help page).

[1]

[2]

[3]