欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

zipfile模块解压zip文件中文乱码

程序员文章站 2022-05-31 17:33:27
...

在python2.7当中使用zipfile模块解压包含中文文件名的zip文件会出现奇奇怪怪的ASCII码拓展当中的字符
为此写了一个实现用于解决这一问题,主要是python2.7,由于2和3的编码问题,Python3应该不会这个问题(没具体测试过3解压中文会怎么样)

以下是实现,具体可以看看zipfile源码当中对于数据流的处理,很容易写出来
以下代码是从本人项目当中抽取出来稍作修改过后的,没有经过实际测试,如果有问题自行调试,总体处理方式应该是没有问题的。

个人gists:https://gist.github.com/ClamCao/bf0aef6a83526b038d3194de0540bf03

# coding=utf8
# !/usr/bin/python2.7

import chardet
import os
import zipfile

def decompressChar(string, encode=None):
     try:
         if isinstance(string, unicode):
             string = string.encode("UTF8")
         elif encode is not None:
             string = unicode(string, encode)
         else:
             encodeTyp = chardet.detect(string)["encoding"]
             if encodeTyp is not None:
                 string = string.decode(encodeTyp).encode("UTF8")
     except UnicodeDecodeError as e:
         # 出现此错误一般为GBK编码涉及范围不及GB18030
         if 'gb2312' in e:
             string = string.decode('GB18030').encode("UTF8")
     return string


def zipfilePreprocess(zipfilePath, ):

    with zipfile.ZipFile(zipfilePath, ) as zipfileHandler:
        # 检查zip文件是否进行了加密,此处可以参考源码的实现(对于加密文件不做处理)
        # check zipfile whether encrypted or not(do nothing for encrypted file here)
        # refer -> zipfile source code line#983
        if filter(lambda x: x.flag_bits & 0x01, zipfileHandler.infolist()):
            return

        # 提取zip文件当中的各文件名和对应的文件内容,其实质上文件内容就是以字典结构存储在dict当中
        # Extract each file name and corresponding file content in the zip file
        members = [(f, member) for f, member in zipfileHandler.NameToInfo.iteritems()]
        for subFname, member in members:
            subFname = os.path.split(subFname)[-1]
            # change path by yourself, default current path
            subFilePath = os.path.join("./", decompressChar(subFname))
            # 通过文件写入的方式来对乱码处理,这是最关键的一步
            # 对于会出现zipfile模块解压zip文件乱码主要是winzip压缩的文件出现此类情况比较多,具体解释可以参考下面注释链接当中的内容说明
            # it is point for processing messy char
            # do not use zipfile.extractall() to unzip,otherwise filename,contained chinese, will be a mess after that. zipped by WinZip specially.
            # refer -> https://docs.python.org/2/library/zipfile.html#zipfile.ZipFile.write
            #       -> zipfile source line#1082
            with open(subFilePath, 'wb') as subFileHandler:
                shutil.copyfileobj(zipfileHandler.open(member), subFileHandl
```