《CS：APP》 chapter 2 Representing and Manipulating Informat

程序员文章站 2022-04-26 11:53:23

...

Representing and Manipulating Information 首先，普及一下历史知识，原来十进制数都使用1000多年了。。。其实我真不知道。。。斐波拉契(Fibonacci)很屌的说(废话。。。。) The familiar decimal, or base-10, representation has been in use for over 100

Representing and Manipulating Information

首先，普及一下历史知识，原来十进制数都使用1000多年了。。。其实我真不知道。。。斐波拉契(Fibonacci)很屌的说(废话。。。。)

The familiar decimal, or base-10, representation has been in use for over 1000 years, having been developed in India, improved by Arab mathematicians in the 12th century, and brought to the West in the 13th century by the Italian mathematician Leonardo Pisano (c. 1170 – c. 1250), better known as Fibonacci.

为什么计算机选择二进制进行基础运算，而不是十进制？

Binary values work better when building machines that store and process information. Two-valued signals can
readily be represented, stored, and transmitted

数据Overflow的根本原因

Computer representations use a limited number of bits to encode a number, and hence some operations can overflow when the results are too large to be rep-resented

2.1 Information Storage

Rather than accessing individual bits in memory, most computers use blocks of eight bits, or bytes, as the smallest addressable unit of memory. Every byte of memory is identified by a unique number, known as its
address.

2.1.1 Hexadecimal Notation
A single byte consists of 8 bits. In binary notation, its value ranges from 00000000 to 11111111. When viewed as a decimal integer, its value ranges from 0 to 255.Neither notation is very convenient for describing bit patterns. Binary notation is too verbose, while with decimal notation, it is tedious to convert to and from bit patterns. Instead, we write bit patterns as base-16, or hexadecimal numbers. Hexadecimal (or simply “hex”) uses digits ‘0’ through ‘9’ along with characters ‘A’ through ‘F’ to represent 16 possible values.

《CS：APP》 chapter 2 Representing and Manipulating Informat

关于10进制，16进制，8进制，2进制之间的相互转换不在此赘述，已经讲烂了。。。

2.1.2 Words
Every computer has aword size , indicating the nominal size of integer and pointer data.

As hardware costs drop over time, even desktop and laptop machines will switch to 64-bit word sizes, and so we will consider the general case of a w -bit word size, as well as the special cases of w = 32 and w = 64.

2.1.3 Data Sizes

《CS：APP》 chapter 2 Representing and Manipulating Informat

　　　 A　“long” integer uses the full word size of the machine. The “long long” integer　data type, introduced in ISO C99, allows the full range of 64-bit integers. For 32-bit　machines, the compiler must compile operations for this data type by generating　code that performs sequences of 32-bit operations.

2.1.4 Addressing and Byte Ordering

《CS：APP》 chapter 2 Representing and Manipulating Informat

　　　　　The former convention—where the least signifi-cant byte comes first—is referred to as little endian. This convention is followed　by most Intel-compatible machines. The latter convention—where the most sig-nificant byte comes first—is referred to as big endian. This convention is followed　by most machines from IBM and Sun Microsystems. Note that we said “most.”　The conventions do not split precisely along corporate boundaries.

Byte ordering becomes an issue. The first is when binary data are communicated over a network between different machines. A common problem is for data produced by a little-endian machine to be sent to a big-endian machine, or vice versa, leading to the bytes within the words being in reverse order for the receiving program.

A second case where byte ordering becomes important is when looking at the byte sequences representing integer data. This occurs often when inspecting machine-level programs. As an example, the following line occurs in a file that gives a text representation of the machine-level code for an Intel IA32 processor:

80483bd: 01 05 64 94 04 08 add %eax,0x8049464

This line was generated by adisassembler, a tool that determines the instruction sequence represented by an executable program file. We will learn more about disassemblers and how to interpret lines such as this in Chapter 3. For now, we simply note that this line states that the hexadecimal byte sequence 01 05 64 94 04 08 is the byte-level representation of an instruction that adds a word ofdata to the value stored at address 0x8049464 . If we take the final 4 bytes of the sequence, 64940408, and write them in reverse order, we have 08049464. Dropping the leading 0, we have the value 0x8049464 , the numeric value written on the right.

2.1.5 Representing Strings

A string in C is encoded by an array of characters terminated by the null (having value 0) character. Each character is represented by some standard encoding, with the most common being the ASCII character code. Thus, if we run our routine show_byteswith arguments "12345" and 6 (to include the terminating character), we get the result 313233343500 .

对于数据的位运算和逻辑运算不在此赘述

2.2 Integer Representations

2.2.1 Integral Data Types

《CS：APP》 chapter 2 Representing and Manipulating Informat

回答为什么二进制数串能表示十进制数

The unsigned binary representation has the important property that every number between 0 and 2^w? 1 has a unique encoding as a w -bit value. For example, there is only one representation of decimal value 11 as an unsigned, 4-bit number, namely [1011]. This property is captured in mathematical terms by stating that function B2Uw is a bijection —it associates a unique value to each bit vector of length w ; conversely, each integer between 0 and 2^w? 1 has a unique binary representation as a bit vector of length w.

2.2.3 Two’s-Complement Encodings

我前几天才知道，所谓的“补码”是two's-complement。。。。只有*说补码，香港和*说的是二补数

The most com-mon computer representation of signed numbers is known as two’s-complement form. This is defined by interpreting the most significant bit of the word to have negative weight. We express this interpretation as a function B2Tw(for “binary to two’s-complement” length w ):

《CS：APP》 chapter 2 Representing and Manipulating Informat

We see that the bit patterns are identical for Figures 2.11 and 2.12 (as well as for Equations 2.2 and 2.4), but the values differ when the most significant bit is 1, since in one case it has weight +8, and in the other case it has weight ?8.)

《CS：APP》 chapter 2 Representing and Manipulating Informat

The C standards do not require signed integers to be represented in two’s-complement form, but nearly all machines do so.

C 标准并没有要求用two's complement 来储存负数，但是大多数极其就是这么干的

对于one's complement and sign-magnitude,分别是大陆说的反码和有符号数

《CS：APP》 chapter 2 Representing and Manipulating Informat

对于理解取相反数以及无符号数，有符号数之间的强制类型转换,极其重要的一张图：

《CS：APP》 chapter 2 Representing and Manipulating Informat

2.2.4 Conversions Between Signed and Unsigned

For example, consider the following code:

1 short int v = -12345;
2 unsigned short uv = (unsigned short) v;
3 printf("v = %d, uv = %u\n", v, uv);

When run on a two’s-complement machine, it generates the following output:

v = -12345, uv = 53191

What we see here is that the effect of casting is to keep the bit values identical but change how these bits are interpreted.We saw in Figure 2.14 that the 16-bit two’s-complement representation of ?12,345 is identical to the 16-bit unsigned representation of 53,191.

Casting from shortint to unsignedshortchanged the numeric value, but not the bit representation. In casting from unsignedintto int , the underlying bit representa-tion stays the same.

《CS：APP》 chapter 2 Representing and Manipulating Informat

2.2.5 Signed vs. Unsigned in C

Although the C standard does not specify a particu-lar representation of signed numbers, almost all machines use two’s complement. Generally, most numbers are signed by default.

C allows conversion between unsigned and signed. The rule is that the under-lying bit representation is not changed.

C允许强制类型转换

对于赋值符号，右边的操作数被强制类型转换成左边的数据类型

而对于比较操作符，操作数中任意一个是无符号数，则将两个操作数都强制类型转换成为无符号的

2.2.6 Expanding the Bit Representation of a Number

数据的位扩展，到这里我就真的不知道怎么组织语言了....T-T

数据类型位数多的向数据类型位数少的扩展，如果是负数，高位补1，如果是正数高位补0。。。就这样吧。。。。

2.2.7 Truncating Numbers

数据的截断

说白了，就是转向位数小的数据类型时，数据进行取模

《CS：APP》 chapter 2 Representing and Manipulating Informat

切记 malloc的参数是size_t ，而size_t 是unsigned int！！！忘记这个会引发血案的！哈哈

2.3 Integer Arithmetic

2.3.1 Unsigned Addition

无符号的加法

一般的加法都很简单，一个坑就是溢出，主要谈的就是这个坑。。。

《CS：APP》 chapter 2 Representing and Manipulating Informat

值得一提的是无符号数只有向上溢出(upward overflow)，没有向下溢出，向下溢出只可能出现在有符号运算中

2.3.2 Two’s-Complement Addition

In fact, most computers use the same machine instruction to perform eitherunsigned or signed addition.

《CS：APP》 chapter 2 Representing and Manipulating Informat

四种加法情况

给出了向上溢出和向下溢出的情况

《CS：APP》 chapter 2 Representing and Manipulating Informat

两个很有意思的练习题，有必要贴一下

Practice Problem 2.30

Write a function with the following prototype:

/* Determine whether arguments can be added without overflow */

int tadd_ok(int x, int y);

This function should return 1 if argumentsx and y can be added withoutcausing overflow.

Solution to Problem 2.30

This function is a direct implementation of the rules given to determine whether
or not a two’s-complement addition overflows.
/* Determine whether arguments can be added without overflow */

int tadd_ok(int x, int y) { int sum = x+y; int neg_ove r=x=0; int pos_ove r=x>=0&&y>=0&&sum

Practice Problem 2.31

Your coworker gets impatient with your analysis of the overflow conditionsfor two’s-complement addition and presents you with the followingimplementation of tadd_ok :

/* Determine whether arguments can be added without overflow */

/* WARNING: This code is buggy. */

int tadd_ok(int x, int y) {

int sum = x+y;

return (sum-x == y) && (sum-y == x);

}

You look at the code and laugh. Explain why.

Solution to Problem 2.31

Your coworker could have learned, by studying div 2.3.2, that two’s-complement addition forms an abelian group, and so the expression (x+y)-x will evaluate to y regardless of whether or not the addition overflows, and that
(x+y)-y will always evaluate to x.

2.3.3 Two’s-Complement Negation

取相反数基本上就可以概括为一句话——取反加1

In C, we can state that for any integer valuex, computing the expressions -xand~x+1 will give identical results.

《CS：APP》 chapter 2 Representing and Manipulating Informat

2.3.4 Unsigned Multiplication

乘法不是很单纯的数学上的乘法，由于溢出的问题，所有的乘法都会有溢出的可能，有溢出就有取模

《CS：APP》 chapter 2 Representing and Manipulating Informat

2.3.5 Two’s-Complement Multiplication

《CS：APP》 chapter 2 Representing and Manipulating Informat

非常有名的一个vulnerability如下:

int tmult_ok(int x,int y)
{

    Long long pll = (long long)x*y;//分别扩展成为64位的数据再做乘法运算

    Return pll == (int)pll;

}

如果把Long long pll = (long long) x*y;

改写成为

Long long pll = x*y;

结果将是32位的数据x*y，然后扩展成64位的数据

2.3.6 Multiplying by Constants

On most machines, the integer multiply instruction is fairly slow,requiring 10 or more clock cycles, whereas other integer operations—such asaddition, subtraction, bit-level operations, and shifting—require only 1 clockcycle.

2.3.7 Dividing by Powers of Two

Integer division on most machines is even slower than integer multiplication—requiring30 or more clock cycles. The two different shifts—logical and arithmetic—serve this purpose forunsigned and two’s-complement numbers, respectively.

2.4 Floating Point

让人感觉很高深的float point，开始！

2.4.1 Fractional Binary Numbers

《CS：APP》 chapter 2 Representing and Manipulating Informat

2.4.2 IEEE Floating-Point Representation

The sign s determines whether the number is negative (s = 1) or positive (s = 0), where the interpretation of the sign bit for numeric value 0 is handled as a special case.

The significand M is a fractional binary number that ranges either between 1 and 2 ?theta or between 0 and 1? theta

The exponent E weights the value by a (possibly negative) power of 2.

The bit representation of a floating-point number is divided into three fields to
encode these values:

The single sign bit s directly encodes the sign s .

The k -bit exponent field exp = ek ?1...e1e0 encodes the exponent E .

The n-bit fraction field frac= fn?1...f1f0 encodes the significand M , but the value encoded also depends on whether or not the exponent field equals 0.

In the single-precision floating-point format (a float in C), fieldss, exp , and frac are 1, k = 8, and n = 23 bits each, yielding a 32-bit representation. In the double-precision floating-point format (a double in C), fields s, exp , and fracare 1, k = 11, andn = 52 bits each, yielding a 64-bit representation.

《CS：APP》 chapter 2 Representing and Manipulating Informat

Case 1: Normalized Values
This is the most common case. It occurs when the bit pattern of exp is neither all zeros (numeric value 0) nor all ones (numeric value 255 for single precision, 2047 for double). In this case, the exponent field is interpreted as representing a signed integer inbiased form. That is, the exponent value is E = e ? Bias where e
is the unsigned number having bit representation ek ?1...e1e0, and Bias is a bias value equal to 2^(k ?1) ? 1 (127 for single precision and 1023 for double). This yields exponent ranges from ?126 to +127 for single precision and ?1022 to+1023 for double precision. The fraction field fracis interpreted as representing the fractional value f ,
where 0 ≤ fM = 1 + f . This is sometimes called an implied leading 1 representation, because we can viewM to be the number with binary representation 1 .fn?1fn?2...f0. Thisrepresentation is a trick for getting an additional bit of precision for free, since we can always adjust the exponent E so that significand M is in the range 1 ≤ M