Encoding Basics III

1476

3 分钟

这篇文章是在3年前发布的。

Bits as Possibilities, NOT Numbers

The last post we mentioned that 1111 1010 represents -6 instead of 250 in computers. Why do we think it should be 250? Because by normal conversion procedure mentioned in the last post, if we convert 1111 1010 to base-10 we get 250; if we convert 250 in base-10 to base-2 we get 1111 1010. What is wrong?

The problem is that our computer does not use this simple conversion scheme in representing base-10 numbers as base-2 numbers. If it does - how do we represent negative numbers? We cannot add a sign like what we do in mathematics - remember, computers do not have signs. They only have 0s and 1s. So, instead of

0000 0000 = 0
0000 0001 = 1
0000 0010 = 2
...
1111 1110 = 254
1111 1111 = 255

We can devise a scheme that:

0000 0000 = -128
0000 0001 = -127
0000 0010 = -126
...
1111 1110 = 126
1111 1111 = 127

So as to accommodate negative numbers. What about numbers beyond -128 and 127? You have to use more bits to represent them. In fact, the common data type of signed integer number in many programming language is int, which often has 4 bytes. This means it can represents 2^{32} different possibilities. Instead of representing 0 to 2^{32}-1, we can let them represent - 2^{31} to 2^{31} -1.

How is this even allowed? After all, mathematically, 0000 0000 in binary is never -128 in base-10. How could we impose such a definition which is mathematically incorrect? The reason is that the 0000 0000 here is not a number, but a possibility. A Byte of information gives us 256 possibilities and we can use these to represent any 256 distinct numbers: 0 to 255, or -128 to 127, or all odd numbers within 1 to 511…

The key thing is to select a correspondence scheme meaningfully so that we can represent numerical values as complete as possible with minimal complexity in terms of both machine process on operating them and human readability. Hence, we cannot just use the mathematical conversion as the correspondence scheme because it does not allow us to represent negative number.

True Form, and Two’s Complement

Representing Negative Numbers

The most intuitive way of representing negative numbers is to imitate what we do in mathematics. We can append a “sign” at the front of each binary sequence, using 0 as positive and 1 as negative. Thus, when machines interpret the numbers, they first have to know the first bit is not related to its value but its sign. For example, 0000 1001 is 9 but 1000 1001 is -9. The full list is:

0000 0000 = 0
0000 0001 = 1
0000 0010 = 2
...
0111 1111 = 127
1000 0000 = -0
1000 0001 = -1
1000 0010 = -2
...
1111 1110 = -126
1111 1111 = -127

This is the True From (or Sign/Magnitude) scheme (that is, we say that 1111 1110 is the true form of number -126). Let us get a little deeper than this. With convention mathematical conversion, 1000 0001 is 129, but as a true form it really represents -1. Interpret this another way, we have

T = \begin{cases} C, & \text{if C < M } \\ M - C, & \text{if C >= M}\end{cases}

Or,

T = \begin{cases} C, & \text{if C < M } \\ - (C \bmod (M+1)), & \text{if C >= M}\end{cases}

where T is the true value, C is the value obtained by converting the true form number to base-10 using mathematical conversion, and M is the minimal positive number that cannot be expressed by a fixed count of n bits (M = 2^{n-1}; for example, M=2^{(8-1)}=128 in this case where n=8).

This works, but clearly not optimally. For one, why are we having a 0 and a -0? This will cause problems in computer operation (The computer will report 0 != -0 since 0000 0000 != 1000 0000), and it really does not look elegant.

The second way of writing the true value inspires a more elegant solution is Two’s Complement, using the same idea of modulo. What about:

T = C \bmod (2M)

Since this is an modulo operation, we take T value as the one that falls into the integers range [-M,M-1]. This gives:

0000 0000 = 0
0000 0001 = 1
0000 0010 = 2
...
0111 1111 = 127
1000 0000 = -128
1000 0001 = -127
1000 0010 = -126
...
1111 1110 = -2
1111 1111 = -1

This scheme is called Two’s Complement. This is how positive and negative integers are represented in modern computers. Why use Two’s Complement? It does not have two 0s, and it is also intuitive (if you think about modulo). The greatest benefits is that it allows the same logic gate set up to perform addition and subtraction as subtracting is just adding a negative number.

That is why in the last post we get -6 on performing NOT on 5. 5 is 0000 0101 and a NOT turns it to 1111 1010 which represents -6 by Two’s Complement.

Heuristics

There are two other schemes of representing negative numbers other than True Form and Two’s Complement, which are rarely used.

One’s Complement:

T = C \bmod (2M-1)

Since this is an modulo operation, we take T value as the one that falls into the integers range [-(M-1),M-1]. This gives:

0000 0000 = 0
0000 0001 = 1
0000 0010 = 2
...
0111 1111 = 127
1000 0000 = -127
1000 0001 = -126
1000 0010 = -125
...
1111 1110 = -1
1111 1111 = -0

Offset Binary:

T = (C+M) \bmod (2M)

Since this is an modulo operation, we take T value as the one that falls into the integers range [-M,M-1]. This gives:

0000 0000 = -128
0000 0001 = -127
...
0111 1110 = -2
0111 1111 = -1
1000 0000 = 0
1000 0001 = 1
1000 0010 = 2
...
1111 1110 = 126
1111 1111 = 127

There are some heuristics for easy conversion between them:

All these can be easily proven by modulo mathematics.

For example,

True Value (Base-10)True FormOne’s ComplementTwo’s ComplementOffset Binary
50000 01010000 01010000 01011000 0101
350010 00110010 00110010 00111010 0011
-51000 01011111 10101111 10110111 1011
-351010 00111101 11001101 11010101 1101

To summarize:

A Byte of Binary DigitsTrue Value (Interpreted as True Form)True Value (Interpreted as One’s Complement)True Value (Interpreted as Two’s Complement)True Value (Interpreted as Offset Binary)
0000 0000000-128
0000 0001111-127
0000 0010222-126
0111 1110126126126-2
0111 1111127127127-1
1000 0000-0-127-1280
1000 0001-1-126-1271
1111 1101-125-2-3125
1111 1110-126-1-2126
1111 1111-127-0-1127

Floating-Point Numbers, and Other Numerical Types

Now, think about how to represent decimal numbers. In the previous post we know 47.625*{(10)}=101111.101*{(2)}. Hence, what about storing this number with 9 bits, the first 6 for its integer parts and the last three for its decimal parts? This works, but it certainly brings troubles when some numbers with some different counts of integer and decimal digits need to be represented. A solution is to allocate a fixed number of bits for the integer and a fixed number of bits for the decimal. For example, we use two bytes for each decimal number, with the first byte representing its integer part and the second byte representing its decimal part.

This is called Fixed-point representation, in the sense that it fixes the position of the decimal point in each binary sequence that is supposed to represent a number. However, the accuracy and range obtained from this scheme is not satisfactory. If you are interested, you can delve deeper by following up the links below:

Conclusion

Just knowing the principle about the bits as possibilities and 1-1 correspondence is not sufficient. Under the principle, there can be numerous practical scheme that implements the principle like the four for integer and the various IEEE standards for floating-point numbers. It is important to choose wisely. Next up, we are using the possibilities that bits give us for characters and symbols.

-- 龙雨
发布于2021年9月02日 GMT+8
更新于2021年9月02日 GMT+8