PyPDF2中文配置
程序员文章站
2022-05-25 19:02:27
...
PyPDF2 中文设置
PyPDF2 默认是 Latin-1 编码的,当处理中文文档的时候就会报错。
本文内容 Linux 与 Windows 通用 已测试
快速方法:(覆盖文件)
配置文件下载
将下载的 generic.py
和 utils.py
复制到 目录...\site-packages\PyPDF2
下即可
自定义:(自己修改配置文件)
将 utils.py
244行开始到247行的内容:
r = s.encode('latin-1')
if len(s) < 2:
bc[s] = r
return r
修改为
r = s.encode('utf-8')
if len(s) < 2:
bc[s] = r
return r
将 generic.py
483行开始到492行的内容
try:
return NameObject(name.decode('utf-8'))
except (UnicodeEncodeError, UnicodeDecodeError) as e:
# Name objects should represent irregular characters
# with a '#' followed by the symbol's hex number
if not pdf.strict:
warnings.warn("Illegal character in Name Object", utils.PdfReadWarning)
return NameObject(name)
else:
raise utils.PdfReadError("Illegal character in Name Object")
修改为
try:
return NameObject(name.decode('utf-8'))
except (UnicodeEncodeError, UnicodeDecodeError) as e:
try:
return NameObject(name.decode('gbk'))
except (UnicodeEncodeError, UnicodeDecodeError) as e:
# Name objects should represent irregular characters
# with a '#' followed by the symbol's hex number
if not pdf.strict:
warnings.warn("Illegal character in Name Object", utils.PdfReadWarning)
return NameObject(name)
else:
raise utils.PdfReadError("Illegal character in Name Object")
文章内容结束,以上内容在2020年9月14日 Windows 与 Linux 平台下 均测试通过