【debug】UnicodeDecodeError: codec can't decode byte 0xbd in position 4: invalid start byte
程序员文章站
2022-07-14 10:41:22
...
问题描述:
python内置编码方式为unicode编码,当读取一个非unicode编码文本时,python将报错,如:
情况一:
当文本为 utf-8 编码,直接读取将报错:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbd in position 4: invalid start byte
Traceback (most recent call last):
File "C:/Users/dan/Desktop/python/codec.py", line 2, in <module>
print(f.read())
File "C:\Python37\lib\codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbd in position 4: invalid start byte
情况二:
当文本为 gbk 编码,直接读取将报错:
UnicodeDecodeError: 'gbk' codec can't decode byte 0xbd in position 8: incomplete multibyte sequence
Traceback (most recent call last):
File "C:/Users/dan/Desktop/python/codec.py", line 2, in <module>
print(f.read())
UnicodeDecodeError: 'gbk' codec can't decode byte 0xbd in position 8: incomplete multibyte sequence
解决方式:
以对应的编码方式读取文本即可
情况一:
加入编码方式 “encoding='utf-8'”
with open('./case1.txt', 'r', encoding='utf-8') as f:
print(f.read())
情况二:
加入编码方式 “encoding='gbk'”
附加:
以下代码可以自动识别文本编码方式并进行文本读取,适用于需要多次读取文本,且文本的编码方式各不一样时:
import chardet # 编码识别模块
with open('./case.txt', 'rb') as f: # ’rb’模式按照二进制位进行读取,不会将读取的字节转换成字符
ecd = chardet.detect(f.read())['encoding'] # 识别所读文本编码
with open('./case.txt', 'r', encoding=ecd) as f: # 以对应编码方式读取文本
read = f.read()
print(read)
上一篇: idea使用
推荐阅读
-
Anaconda中启动Python时的错误:UnicodeDecodeError: 'gbk' codec can't decode byte 0xaf in position 553
-
UnicodeDecodeError: ‘ascii‘ codec can‘t decode byte 0xcb in position 260: ordinal not in range(128)
-
命令行调用python出现编码错误:UnicodeDecodeError: ‘gbk‘ codec can‘t decode byte 0x9a in position 140 完美解决!
-
Python2.7更新pip:UnicodeDecodeError: 'ascii' codec can't decode byte 0xb7 in position 7: ordinal not in range(128)
-
真正解决Windows下UnicodeDecodeError: ‘gbk‘ codec can‘t decode byte 0xff in position 0错误的方法
-
当在命令行中执行virtualenv venv时报此错误:'utf-8' codec can't decode byte 0xd5 in position 38: invalid continuation by
-
UnicodeDecodeError: 'utf-8' codec can't decode byte in position : invalid continuation byte
-
【debug】UnicodeDecodeError: codec can't decode byte 0xbd in position 4: invalid start byte
-
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc7 in position 0: invalid continuation byte
-
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd6 in position 0: invalid continuation byte