How computers handle Floating point numbers

Hasitha Subhashana
4 min readMay 15, 2021

Floating-point numbers are the numbers that contain floating decimal points. (As their name implies). The most commonly used floating-point standard is the IEEE standard. So, let’s look what this standard is.

In computers, there is a standard called IEEE 754 IEEE Standard for Floating-Point Arithmetic. This standard describes how a computer should handle floating-point numbers. If you like further information here where you can find it by visiting the below link

According to this standard, a floating-point number should be divided into three components.

  1. Sign bit.
  2. Exponent.
  3. Mantissa.
Figure 1: Single precision IEEE 754 Floating-point standard (Source:https://www.geeksforgeeks.org/)
Figure 2: Double-precision IEEE 754 Floating-point standard (Source:https://www.geeksforgeeks.org/)
Single, double, long-double precision representation

Let’s use 9.1A single-precision value, for our example, to see how this representation works. To convert this to IEEE 754 standard, you have to follow the below steps.

  1. Convert the floating-point number into binary.
  2. Write the converted binary in scientific format.
  3. Write the binary (which is written in scientific format) according to IEEE 754 standard.

A the end of this conversion 9.1 will be converted to a binary with a sign bit, exponent and a mantissa.

So first convert 9.1 into binary.

According to our example value, 9 is the integral part, and 0.1 is the fractional part. So they should be converted separately.

Converting 9.1 to decimal

Integral part 9 = 1001

Fraction part .1 = 000110011001100….

So combine both, and you will get the binary value of 9.1, which is 1001.0001100110011001100……
Even though 9.1 is an infinite binary number, we have only 23 bits to store it.

Now let’s write this in scientific notation as below.
9.1 in scientific notation — 1.0010001100110011001100 * 23

Note the change in decimal point.

Next, this number should be written in IEEE 754 format.

The first component in IEEE 754 Floating-point standard the signed bit.

  • If the number is (+), the sign bit should be “0”.
  • If the number is (-) sign bit should be “1”.

So, according to our example sign bit should be “0”.

Now let’s look into the second component.

Here exponent is an 8-bit value since this is a single-precision value. Therefore, bias is 127. Now you should calculate the binary of bias+exponent. As in here, 130=127 + 3 and binary of 130 is 10000010, and this is our exponent bias.

Then the remaining component is the mantissa. The mantissa is nothing other than 9.1’s Binary Scientific Notation. But note that the leading one is omitted since all numbers except zero begin with a leading 1.
So now, according to IEEE 754 standard, 9.1 it should look like this 010000010 00100011001100110011001 . So, this is how all modern computers and CPUs handle 9.1 internally.

9.1 in IEEE 754 standard

But if you use a computer or IEEE-754 Floating Point Converter to Calculate 9.1’s IEEE 754 representation, you will see a change in the last bit.

0 10000010 00100011001100110011010

We get this unexpected value due to the floating-point rounding error in computers. You can see this for yourself using the IEEE-754 Floating Point Converter.

Well, I know what you think. It’s just one number. But this could give very unintended results in your programming language. Still don’t trust me? Ok, with my upcoming article, let’s see how this can affect your so-called high-level programming languages. Until that, stay safe and take care.

References

--

--