python2 中文字符编码转换

程序员文章站 2022-05-10 19:30:47

...

各种编码意义：
参考链接https://blog.csdn.net/qq_33733970/article/details/81084465

GB2312是中国规定的汉字编码，也可以说是简体中文的字符集编码
GBK 是 GB2312的扩展 ,除了兼容GB2312外，它还能显示繁体中文
cp936（中文编码）中文本地系统是Windows中的cmd，默认codepage是CP936，cp936就是指系统里第936号编码格式，即GB2312的编码
Unicode是国际组织制定的可以容纳世界上所有文字和符号的字符编码方案。
UTF-8 （8-bit Unicode Transformation Format）是最流行的一种对 Unicode
进行传播和存储的编码方式。它用不同的 bytes 来表示每一个代码点。ASCII 字符每个只需要用一个 byte ，与 ASCII的编码是一样的。所以说 ASCII 是 UTF-8 的一个子集。
UTF-8、UTF-16、UTF-32都是将数字转换到程序数据的编码方案。

# -*- coding: cp936 -*-
#str字符串格式
s = '中国'
print type(s)#<type 'str'>
print s

#Unicode编码格式
u = u'中国'
print type(u)#<type 'unicode'>
print u 

#将Unicode编码的字符串u-->变换成str格式
str1 = u.encode('cp936')
print type(str1)
print str1

str2 = u.encode('utf-8')
print type(str2)
print str2

#将str编码的字符串s-->变换成Unicode格式
str3 = s.decode('gbk')
print type(str3)
print str3

输出结果：

<type 'str'>
中国
<type 'unicode'>
中国
<type 'str'>
中国
<type 'str'>
中国
<type 'unicode'>
中国

将# -- coding: cp936 --换成# -- coding: utf-8 --，输出结果如下：

<type 'str'>
中国
<type 'unicode'>
中国
<type 'str'>
中国
<type 'str'>
中国

Traceback (most recent call last):
  File "C:\Users\gezi9\Desktop\编码转换.py", line 24, in <module>
    #灏唖tr缂栫爜鐨勫瓧绗︿覆s-->鍙樻崲鎴怳nicode鏍煎紡
UnicodeDecodeError: 'gbk' codec can't decode bytes in position 2-3: illegal multibyte sequence

去掉# -- coding: cp936 --和# -- coding: utf-8 --等，输出结果如下：

<type 'str'>
中国
<type 'unicode'>
ÖÐ¹ú

Traceback (most recent call last):
  File "C:\Users\gezi9\Desktop\编码转换.py", line 15, in <module>
    str1 = u.encode('cp936')
UnicodeEncodeError: 'gbk' codec can't encode character u'\xd6' in position 0: illegal multibyte sequence

在shell中执行上述程序：

>>> s = '中国'
>>> type(s)
<type 'str'>
>>> s
'\xd6\xd0\xb9\xfa'
>>> print s
中国

>>> u = u'中国'
>>> type(u)
<type 'unicode'>
>>> u
u'\u4e2d\u56fd'
>>> print u
中国

>>> str1 = u.encode('cp936')
>>> type(str1)
<type 'str'>
>>> str1
'\xd6\xd0\xb9\xfa'

>>> str2 = u.encode('utf-8')
>>> type(str2)
<type 'str'>
>>> str2
'\xe4\xb8\xad\xe5\x9b\xbd'

>>> str3 = s.decode('gbk')
>>> type(str3)
<type 'unicode'>
>>> str3
u'\u4e2d\u56fd'

>>>

python2 中文字符编码转换

python实现中文转换url编码的方法

PHP学习笔记之字符串编码的转换和判断

Java Base64位编码与String字符串的相互转换,Base64与Bitmap的相互转换实例代码

C#实现字符串与图片的Base64编码转换操作示例

shell实现字符编码转换工具分享

js转换字符串编码（详解js对象转换成字符串）

python实现中文转换url编码的方法

支持中文和其他编码的php截取字符串函数分享(截取中文字符串)

js转换字符串编码（详解js对象转换成字符串）

php字符编码转换之gb2312转为utf8