Storing Real Numbers

We begin with a review of scientific notation. When we have to write very large or very small numbers, it is easier to write them in scientific notation. For example,
351,000,000,000 = 3.51 x 1011
and
.000000000124 = 1.24 x 10-10
Notice that we place the decimal point just to the right of the leftmost nonzero digit in the value. We call the leftmost nonzero digit the most significant digit. In the text the decimal point is placed to the left of the most significant digit, but we shall always put the decimal point to its right.

The form of a number in scientific notation is

n = f x 10e
where f is a signed fractional part called a mantissa and e is a signed integer exponent.

On a computer, a real number is often called a floating point number. Floating point numbers are stored in a fixed number of bits of computer memory. The exact number of bits depends on the computer system being used. Using a fixed number of bits to store floating point numbers produces a phenomenon called finite precision. We cannot store every real number precisely. Instead, we must round off any number that has too many digits to fit in our limited number of bits and store an approximation for that number. The loss of precision due to round-off is called round-off error.

On a computer we store floating point numbers in binary. This means we have numbers of the form

n = f x 2e
where f is a signed mantissa and e is a signed exponent. The bits allocated for a floating point number are divided up into a sign bit, a certain number of bits for an exponent, and a certain number of bits for a mantissa. The sign bit is used to store the sign of the mantissa. Typically, 0 means positive and 1 means negative. Note that there is no sign bit for the exponent even though the exponent is a signed value. We use a different system for keeping track of the sign of the exponent which we shall discuss momentarily.

For our examples, we will use a system laid out as follows: picture of layout

In this system we use 16 bits for each floating point number: one sign bit, five bits for the exponent and ten bits for the mantissa.

Suppose we wish to store the value -122.5. First we convert the value to binary.

-122.510 = -1111010.12
Next we normalize the binary number. We move the radix point so that it lies just to the right of the leftmost nonzero digit. Normalizing the above number gives us:
-1111010.1 = -1.1110101 x 26
Thus we have a sign bit of 1 (for negative), a mantissa of 1110101 and an exponent of 6. (In the mantissa, we do not store the 1 that appears before the point, because this digit will always be a one.) The exponent 6 must be stored in binary. It also must be stored in some way that keeps track of its sign since there is no sign bit for the exponent.

Our system uses five bits for an exponent. In five bits we can store patterns 00000, 00001, 00010, ..., 11111. We could use these patterns as their unsigned binary values 0 through 31, but that wouldn't give us any negative values for exponents. Instead, we use an excess-2k-1-1 system, where k is the number of bits used for the exponent. For our system which allocates 5 bits for the exponent, we use an excess-15 system. This means that we add 15 to the true exponent and store the result. This gives us the ability to store exponents between -14 and +15. To store -14 we first add 15 to it to get 1, then record the bits 00001. To store 15 we first add 15 to it to get 30, then record 11110. The pattern 00000 is reserved for zero, and the pattern 11111 is reserved for indicating overflow: that numbers are too large or too small to be represented. As an example. To store an exponent of 0 we first add 15 to it then record 01111. the following table shows the relationship between actual exponents and stored exponents. (Note the exponent 00000 is reserved for the number zero.)

True Exponent Stored Exponent True Exponent Stored Exponent
-15XXXXX110000
-1400001210001
-1300010310010
-1200011410011
-1100100510100
-1000101610101
-900110710110
-800111810111
-701000911000
-6010011011001
-5010101111010
-4010111211011
-3011001311100
-2011011411101
-1011101511110
00111116XXXXX

In the example we were working on, we required an exponent of 6. We add the excess 15 to 6 to get 21, then store 21 in binary. We thus record 10101 for our exponent.

Putting it all together, the value -122.5 is stored as

1 10101 1110101000
We have added three 0's to the right-hand end of the mantissa to fill out its 10 digits, and we have inserted spaces to enable us to see the different parts of the number more easily.

Now let's start with a stored floating point number and figure out its decimal value. Here is a stored value:

0 10001 0101000000
The sign bit tells us that the number is positive. The exponent is 10001 which has an unsigned binary value of 17. In order to get the true value of the exponent we must subtract the excess of 15. So our true exponent is 17 - 15 = 2. The mantissa is .0101, so our value is
1.0101 x 22 = 101.01
Thus the value of the floating point number is 5.25 in base 10.

One final note: the number 0 is stored as all 0's.