欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  IT编程

Python3实现xml转json文件

程序员文章站 2022-03-20 15:45:28
使用了Python的 xml.etree.ElementTree 库,Python版本Python 3.6.6 from xml.etree import ElementTree LISTTYPE = 1 DICTTYPE = 0 def getDictResults(res_dicts, iter ......

使用了python的 xml.etree.elementtree 库,python版本python 3.6.6

from xml.etree import elementtree

listtype = 1
dicttype = 0

def getdictresults(res_dicts, iters):
    result_dicts = {}
    for iter in iters.getchildren():
        iterxml(iter, result_dicts)

    if result_dicts:
        res_dicts[iters.tag].update(result_dicts)

def getlistresults(res_dicts, iters):
    result_lists = []
    for iter in iters.getchildren():
        result_dicts = {}
        iterxml(iter, result_dicts)
        result_lists.append(result_dicts.copy())
        del(result_dicts)
    
    if result_lists:
        if len(res_dicts[iters.tag].items()) == 0:
            res_dicts[iters.tag] = result_lists.copy()
        else:
            res_dicts[iters.tag]["__xmlobjchildren__"] = result_lists.copy()

        del(result_lists)

def checkxmlchildrentype(iters):
    taglist = []
    for iter in iters.getchildren():
        taglist.append(iter.tag)

    if len(set(taglist)) == len(taglist):
        return dicttype
    else:
        return listtype

def getresults(res_dicts, iters):
    if checkxmlchildrentype(iters):
        return getlistresults(res_dicts, iters)
    else:
        return getdictresults(res_dicts, iters)

#@res_dicts    {}
def iterxml(iter, res_dicts):
    res_dicts[iter.tag] = {}

    if iter.attrib:
        for k,v in dict(iter.attrib).items():
            res_dicts[iter.tag].update({k : v})
    
    if iter.text is not none and iter.text.strip() != "":
        res_dicts[iter.tag].update({"__xmltagtext__" : iter.text.strip()})
    
    if iter.getchildren():
        getresults(res_dicts, iter)

def parserxmltojson(file_path):
    try:
        tree = elementtree.parse(file_path)
    except exception as e:
        #multi-byte encodings are not supported    把字符集改成utf-8就可以
        #encoding specified in xml declaration is incorrect    xml encoding标识和文件的字符集不同
        #syntax error    语法错误,乱码等
        #not well-formed (invalid token)    编辑器点击后字符集被修改成ascii等,或者文件本身字符集和xml encoding不相同
        print("parser {} error, errmsg: {}".format(file_path, e))
        return ""

    if tree is none:
        print("{} is none.".format(file_path))
        return ""

    root = tree.getroot()

    report = {}
    iterxml(root, report)
    #return getdictresults(root)

    return report

if __name__ == "__main__":
    jsonret = parserxmltojson("test.xml")
    with open("test.json", "w", encoding="utf-8") as fd:
        fd.write(str(jsonret))
    print(jsonret)