欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

【debug】UnicodeDecodeError: codec can't decode byte 0xbd in position 4: invalid start byte

程序员文章站 2022-07-14 10:41:22
...

问题描述:

python内置编码方式为unicode编码,当读取一个非unicode编码文本时,python将报错,如:

情况一:

当文本为 utf-8 编码,直接读取将报错:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbd in position 4: invalid start byte

Traceback (most recent call last):
  File "C:/Users/dan/Desktop/python/codec.py", line 2, in <module>
    print(f.read())
  File "C:\Python37\lib\codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbd in position 4: invalid start byte

【debug】UnicodeDecodeError: codec can't decode byte 0xbd in position 4: invalid start byte

情况二:

当文本为 gbk 编码,直接读取将报错:

UnicodeDecodeError: 'gbk' codec can't decode byte 0xbd in position 8: incomplete multibyte sequence

Traceback (most recent call last):
  File "C:/Users/dan/Desktop/python/codec.py", line 2, in <module>
    print(f.read())
UnicodeDecodeError: 'gbk' codec can't decode byte 0xbd in position 8: incomplete multibyte sequence

【debug】UnicodeDecodeError: codec can't decode byte 0xbd in position 4: invalid start byte

 

解决方式:

以对应的编码方式读取文本即可

情况一:

加入编码方式 encoding='utf-8'

with open('./case1.txt', 'r', encoding='utf-8') as f:
    print(f.read())

【debug】UnicodeDecodeError: codec can't decode byte 0xbd in position 4: invalid start byte

情况二:

加入编码方式 encoding='gbk'

【debug】UnicodeDecodeError: codec can't decode byte 0xbd in position 4: invalid start byte

 

附加:

以下代码可以自动识别文本编码方式并进行文本读取,适用于需要多次读取文本,且文本的编码方式各不一样时:

import chardet  # 编码识别模块
with open('./case.txt', 'rb') as f: # ’rb’模式按照二进制位进行读取,不会将读取的字节转换成字符
    ecd = chardet.detect(f.read())['encoding']  # 识别所读文本编码
with open('./case.txt', 'r', encoding=ecd) as f:    # 以对应编码方式读取文本
    read = f.read()
    print(read)

【debug】UnicodeDecodeError: codec can't decode byte 0xbd in position 4: invalid start byte