Storing Real Numbers

We begin with a review of scientific notation. When we have to write very large or very small numbers, it is easier to write them in scientific notation. For example,

351,000,000,000 = 3.51 x 10¹¹

and

.000000000124 = 1.24 x 10^-10

Notice that we place the decimal point just to the right of the leftmost nonzero digit in the value. We call the leftmost nonzero digit the most significant digit. In the text the decimal point is placed to the left of the most significant digit, but we shall always put the decimal point to its right.

The form of a number in scientific notation is

n = f x 10^e

where f is a signed fractional part called a mantissa and e is a signed integer exponent.

On a computer, a real number is often called a floating point number. Floating point numbers are stored in a fixed number of bits of computer memory. The exact number of bits depends on the computer system being used. Using a fixed number of bits to store floating point numbers produces a phenomenon called finite precision. We cannot store every real number precisely. Instead, we must round off any number that has too many digits to fit in our limited number of bits and store an approximation for that number. The loss of precision due to round-off is called round-off error.

On a computer we store floating point numbers in binary. This means we have numbers of the form

n = f x 2^e

where f is a signed mantissa and e is a signed exponent. The bits allocated for a floating point number are divided up into a sign bit, a certain number of bits for an exponent, and a certain number of bits for a mantissa. The sign bit is used to store the sign of the mantissa. Typically, 0 means positive and 1 means negative. Note that there is no sign bit for the exponent even though the exponent is a signed value. We use a different system for keeping track of the sign of the exponent which we shall discuss momentarily.

For our examples, we will use a system laid out as follows: picture of layout

In this system we use 16 bits for each floating point number: one sign bit, five bits for the exponent and ten bits for the mantissa.

Suppose we wish to store the value -122.5. First we convert the value to binary.

-122.5₁₀ = -1111010.1₂

Next we normalize the binary number. We move the radix point so that it lies just to the right of the leftmost nonzero digit. Normalizing the above number gives us:

-1111010.1 = -1.1110101 x 2⁶

Thus we have a sign bit of 1 (for negative), a mantissa of 1110101 and an exponent of 6. (In the mantissa, we do not store the 1 that appears before the point, because this digit will always be a one.) The exponent 6 must be stored in binary. It also must be stored in some way that keeps track of its sign since there is no sign bit for the exponent.

Our system uses five bits for an exponent. In five bits we can store patterns 00000, 00001, 00010, ..., 11111. We could use these patterns as their unsigned binary values 0 through 31, but that wouldn't give us any negative values for exponents. Instead, we use an excess-2^k-1-1 system, where k is the number of bits used for the exponent. For our system which allocates 5 bits for the exponent, we use an excess-15 system. This means that we add 15 to the true exponent and store the result. This gives us the ability to store exponents between -14 and +15. To store -14 we first add 15 to it to get 1, then record the bits 00001. To store 15 we first add 15 to it to get 30, then record 11110. The pattern 00000 is reserved for zero, and the pattern 11111 is reserved for indicating overflow: that numbers are too large or too small to be represented. As an example. To store an exponent of 0 we first add 15 to it then record 01111. the following table shows the relationship between actual exponents and stored exponents. (Note the exponent 00000 is reserved for the number zero.)

True Exponent Stored Exponent True Exponent Stored Exponent

-15 XXXXX 1 10000

-14 00001 2 10001

-13 00010 3 10010

-12 00011 4 10011

-11 00100 5 10100

-10 00101 6 10101

-9 00110 7 10110

-8 00111 8 10111

-7 01000 9 11000

-6 01001 10 11001

-5 01010 11 11010

-4 01011 12 11011

-3 01100 13 11100

-2 01101 14 11101

-1 01110 15 11110

0 01111 16 XXXXX

In the example we were working on, we required an exponent of 6. We add the excess 15 to 6 to get 21, then store 21 in binary. We thus record 10101 for our exponent.

Putting it all together, the value -122.5 is stored as

1 10101 1110101000

We have added three 0's to the right-hand end of the mantissa to fill out its 10 digits, and we have inserted spaces to enable us to see the different parts of the number more easily.

Now let's start with a stored floating point number and figure out its decimal value. Here is a stored value:

0 10001 0101000000

The sign bit tells us that the number is positive. The exponent is 10001 which has an unsigned binary value of 17. In order to get the true value of the exponent we must subtract the excess of 15. So our true exponent is 17 - 15 = 2. The mantissa is .0101, so our value is

1.0101 x 2² = 101.01

Thus the value of the floating point number is 5.25 in base 10.

One final note: the number 0 is stored as all 0's.

True Exponent	Stored Exponent	True Exponent	Stored Exponent
-15	XXXXX	1	10000
-14	00001	2	10001
-13	00010	3	10010
-12	00011	4	10011
-11	00100	5	10100
-10	00101	6	10101
-9	00110	7	10110
-8	00111	8	10111
-7	01000	9	11000
-6	01001	10	11001
-5	01010	11	11010
-4	01011	12	11011
-3	01100	13	11100
-2	01101	14	11101
-1	01110	15	11110
0	01111	16	XXXXX