(python功能定制)复杂的xml文件对比,产生HTML展示区别
程序员文章站
2023-08-18 10:15:58
功能的设计初衷: 处理复杂的xml对比,屏蔽同节点先后顺序的影响 主要涉及知识点: 1、ElementTree库 xml解析: 导入ElementTree,import xml.etree.ElementTree as ET 解析Xml文件找到根节点: 直接解析XML文件并获得根节点,tree = ......
功能的设计初衷:
处理复杂的xml对比,屏蔽同节点先后顺序的影响
主要涉及知识点:
1、ElementTree库 ------- xml解析:
-
- 导入ElementTree,
import xml.etree.ElementTree as ET
- 解析Xml文件找到根节点:
- 直接解析XML文件并获得根节点,
tree = ET.parse('country_data.xml') root = tree.getroot()
- 解析字符串,
root = ET.fromstring(country_data_as_string)
- 遍历根节点可以获得子节点,然后就可以根据需求拿到需要的字段了,如:<APP_KEY channel = 'CSDN'> hello123456789 </APP_KEY>
- 导入ElementTree,
-
-
- tag,即标签,用于标识该元素表示哪种数据,即APP_KEY
- attrib,即属性,用Dictionary形式保存,即{'channel' = 'CSDN'}
- text,文本字符串,可以用来存储一些数据,即hello123456789
- tail,尾字符串,并不是必须的,例子中没有包含。
-
2、difflib库 ------- 提供的类和方法用来进行序列的差异化比较,它能够比对文件并生成差异结果文本或者html格式的差异化比较页面
这里使用了类difflib.HtmlDiff,用来创建一个html表格展示文件差异,他既可以进行全文本展示,也可以只展示上下文不同。
其构造函数如下:
__init__(tabsize=8, wrapcolumn=None, linejunk=None, charjunk=IS_CHARACTER_JUNK)
-
-
- tabsize表示制表符代表的空格个数,默认为8
- wrapcolumn,可选参数,用来设置多少个字符时自动换行,默认None,为None时表示不自动换行(重点:可以让html显示更美观)
- linejunk 和 charjunk,可选参数,在ndiff()中使用,
-
公共方法(生成一个包含表格的html文件,其内容是用来展示差异):
make_file(fromlines, tolines [, fromdesc][, todesc][, context][, numlines])
-
-
- fromlines 和tolines,用于比较的内容,格式为字符串组成的列表
- fromdesc 和 todesc,可选参数,对应的fromlines,tolines的差异化文件的标题,默认为空字符串
- context 和 numlines,可选参数,context 为True时,只显示差异的上下文,为false,显示全文,numlines默认为5,当context为True时,控制展示上下文的行数,当context为false时,控制不同差异的高亮之间移动时“next”的开始位置(如果设置为0,当移动懂顶端时,超链接会丢失引用地址)
-
3、platform库 -------- 获取当前系统
4、logger库 -------- 如果使用robot framework,可以看到明显区别,可以定制日志log显示
robot framework的体验还不错,大概是因为其测试报告已经可以满足正常需要,很少有人会想去修改或者增加自己想要展示的内容,比如增加一个超链接,展示更多的内容,所以这部分花了很长时间均没有在网上找到相关资料,最后只能阅读源码。
遗憾与待优化:
其中有一部分内容,原先准备采用自循环的方式处理,但是过程中的数据传输逻辑容易错乱,以后会考虑把这部分优化一下。
##############################以下是代码部分,附件文件可以拖到本地执行并查看结果##################################################
1 # coding=utf-8 2 import re 3 import xml.etree.ElementTree as ET #解析xml的库 4 import difflib #文件对比库 5 import datetime #时间库 6 import platform #获取系统的库window、linux... 7 import os 8 from robot.api import logger #不需要的话可以注释掉:robot framework框架脚本运行时会产生日志,可以利用这个库定制log 9 10 # listafter:将解析后的xml,转换成按序排列的list:(tag,attrib,(tag,attrib,text)) 11 # 此方法是被下面一个方法xmltolist()调用的,想知道具体结果,可以使用下面的方法打印解析后的结果 12 def listafter(listcom1): 13 listcomarr1 = [] 14 text1 = [] 15 listcomarr1.append(listcom1.tag) 16 listcomarr1.append(listcom1.attrib) 17 if len(listcom1) > 0: 18 for listcom2 in listcom1: 19 listcomarr2 = [] 20 text2 = [] 21 listcomarr2.append(listcom2.tag) 22 listcomarr2.append(listcom2.attrib) 23 if len(listcom2) > 0: 24 for listcom3 in listcom2: 25 listcomarr3 = [] 26 text3 = [] 27 listcomarr3.append(listcom3.tag) 28 listcomarr3.append(listcom3.attrib) 29 if len(listcom3) > 0: 30 for listcom4 in listcom3: 31 listcomarr4 = [] 32 text4 = [] 33 listcomarr4.append(listcom4.tag) 34 listcomarr4.append(listcom4.attrib) 35 if len(listcom4) > 0: 36 for listcom5 in listcom4: 37 listcomarr5 = [] 38 text5 = [] 39 listcomarr5.append(listcom5.tag) 40 listcomarr5.append(listcom5.attrib) 41 if len(listcom5) > 0: 42 for listcom6 in listcom5: 43 listcomarr6 = [] 44 text6 = [] 45 listcomarr6.append(listcom6.tag) 46 listcomarr6.append(listcom6.attrib) 47 if len(listcom6) > 0: 48 for listcom7 in listcom6: 49 listcomarr7 = [] 50 text7 = [] 51 listcomarr7.append(listcom7.tag) 52 listcomarr7.append(listcom7.attrib) 53 if len(listcom7) > 0: 54 for listcom8 in listcom7: 55 listcomarr8 = [] 56 text8 = [] 57 listcomarr8.append(listcom8.tag) 58 listcomarr8.append(listcom8.attrib) 59 if len(listcom8) > 0: 60 for listcom9 in listcom8: 61 listcomarr9 = [] 62 text9 = [] 63 listcomarr9.append(listcom9.tag) 64 listcomarr9.append(listcom9.attrib) 65 # Start:判断是否需要继续递归 66 if len(listcom9) > 0: 67 for listcom10 in listcom9: 68 listcomarr10 = [] 69 text10 = [] 70 listcomarr10.append(listcom10.tag) 71 listcomarr10.append(listcom10.attrib) 72 listcomarr10.append([listcom10.text]) 73 text9.append(listcomarr10) 74 else: 75 text9.append(listcom9.text) 76 # End:判断是否需要继续递归 77 # list二维数组排序 78 text9 = sorted(text9) 79 listcomarr9.append(text9) 80 text8.append(listcomarr9) 81 else: 82 text8.append(listcom8.text) 83 text8 = sorted(text8) 84 listcomarr8.append(text8) 85 text7.append(listcomarr8) 86 else: 87 text7.append(listcom7.text) 88 text7 = sorted(text7) 89 listcomarr7.append(text7) 90 text6.append(listcomarr7) 91 else: 92 text6.append(listcom6.text) 93 text6 = sorted(text6) 94 listcomarr6.append(text6) 95 text5.append(listcomarr6) 96 else: 97 text5.append(listcom5.text) 98 text5 = sorted(text5) 99 listcomarr5.append(text5) 100 text4.append(listcomarr5) 101 else: 102 text4.append(listcom4.text) 103 text4 = sorted(text4) 104 listcomarr4.append(text4) 105 text3.append(listcomarr4) 106 else: 107 text3.append(listcom3.text) 108 text3 = sorted(text3) 109 listcomarr3.append(text3) 110 text2.append(listcomarr3) 111 else: 112 text2.append(listcom2.text) 113 text2 = sorted(text2) 114 listcomarr2.append(text2) 115 text1.append(listcomarr2) 116 else: 117 text1.append(listcom1.text) 118 text1 = sorted(text1) 119 listcomarr1.append(text1) 120 return listcomarr1 121 122 # 将xml内容转换成按序排列的list,返回值有3个:处理后的spmlxmllist、不需要处理的头部spmlstart、不需要处理的尾部spmlend 123 # spmlstart、spmlend是为了控制不需要处理的头部和尾部,提高处理效率 124 def xmltolist(spml): 125 if spml.find("<spml:") != -1: 126 startnum = re.search(r'<spml:[^>]*>', spml).span()[1] 127 endnum = spml.rfind("</spml:") 128 spmlstart = spml[:startnum].strip() 129 spmlend = spml[endnum:].strip() 130 spmlxml = '''<spml:modifyRequest xmlns:spml='{spml}' xmlns:subscriber="{subscriber}" xmlns:xsi="{xsi}">\n%s</spml:modifyRequest>''' % ( 131 spml[startnum:endnum].strip()) 132 elif spml.find("<PlexViewRequest") != -1: 133 startnum = re.search(r'<PlexViewRequest[^>]*>', spml).span()[1] 134 endnum = spml.rfind("</PlexViewRequest>") 135 spmlstart = spml[:startnum].strip() 136 spmlend = spml[endnum:].strip() 137 spmlxml = '''<PlexViewRequest>\n%s</PlexViewRequest>''' % (spml[startnum:endnum].strip()) 138 else: 139 spmlstart = "" 140 spmlend = "" 141 spmlxml = spml 142 # print spmlstart 143 # print endspml 144 # print spmlxml 145 tree = ET.fromstring(spmlxml) 146 spmlxmllist = listafter(tree) 147 return spmlxmllist, spmlstart, spmlend 148 149 # 将xmltolist处理形成的spmlxmllist再回头变成xml(xml中,同节点的内容已被按需排列) 150 def listtoxml(spmllist1): 151 kong = " " 152 spmltag1 = spmllist1[0] 153 spmlattrib1 = "" 154 bodyxml1 = "" 155 if spmllist1[1] != {}: 156 for key, value in spmllist1[1].items(): 157 spmlattrib1 += " %s='%s'" % (key, value) 158 startxml1 = "<%s%s>" % (spmltag1, spmlattrib1) 159 endxml1 = "</%s>" % (spmltag1) 160 spmlxml1 = "" 161 if isinstance(spmllist1[2][0], list): 162 spmlxml2 = "" 163 for spmllist2 in spmllist1[2]: 164 spmltag2 = spmllist2[0] 165 spmlattrib2 = "" 166 bodyxml2 = "" 167 if spmllist2[1] != {}: 168 for key, value in spmllist2[1].items(): 169 spmlattrib2 += " %s='%s'" % (key, value) 170 startxml2 = "<%s%s>" % (spmltag2, spmlattrib2) 171 endxml2 = "</%s>" % (spmltag2) 172 if isinstance(spmllist2[2][0], list): 173 spmlxml3 = "" 174 for spmllist3 in spmllist2[2]: 175 spmltag3 = spmllist3[0] 176 spmlattrib3 = "" 177 bodyxml3 = "" 178 if spmllist3[1] != {}: 179 for key, value in spmllist3[1].items(): 180 spmlattrib3 += " %s='%s'" % (key, value) 181 startxml3 = "<%s%s>" % (spmltag3, spmlattrib3) 182 endxml3 = "</%s>" % (spmltag3) 183 if isinstance(spmllist3[2][0], list): 184 spmlxml4 = "" 185 for spmllist4 in spmllist3[2]: 186 spmltag4 = spmllist4[0] 187 spmlattrib4 = "" 188 bodyxml4 = "" 189 if spmllist4[1] != {}: 190 for key, value in spmllist4[1].items(): 191 spmlattrib4 += " %s='%s'" % (key, value) 192 startxml4 = "<%s%s>" % (spmltag4, spmlattrib4) 193 endxml4 = "</%s>" % (spmltag4) 194 if isinstance(spmllist4[2][0], list): 195 spmlxml5 = "" 196 for spmllist5 in spmllist4[2]: 197 spmltag5 = spmllist5[0] 198 spmlattrib5 = "" 199 bodyxml5 = "" 200 if spmllist5[1] != {}: 201 for key, value in spmllist5[1].items(): 202 spmlattrib5 += " %s='%s'" % (key, value) 203 startxml5 = "<%s%s>" % (spmltag5, spmlattrib5) 204 endxml5 = "</%s>" % (spmltag5) 205 if isinstance(spmllist5[2][0], list): 206 spmlxml6 = "" 207 for spmllist6 in spmllist5[2]: 208 spmltag6 = spmllist6[0] 209 spmlattrib6 = "" 210 bodyxml6 = "" 211 if spmllist6[1] != {}: 212 for key, value in spmllist6[1].items(): 213 spmlattrib6 += " %s='%s'" % (key, value) 214 startxml6 = "<%s%s>" % (spmltag6, spmlattrib6) 215 endxml6 = "</%s>" % (spmltag6) 216 if isinstance(spmllist6[2][0], list): 217 spmlxml7 = "" 218 for spmllist7 in spmllist6[2]: 219 spmltag7 = spmllist7[0] 220 spmlattrib7 = "" 221 bodyxml7 = "" 222 if spmllist7[1] != {}: 223 for key, value in spmllist7[1].items(): 224 spmlattrib7 += " %s='%s'" % (key, value) 225 startxml7 = "<%s%s>" % (spmltag7, spmlattrib7) 226 endxml7 = "</%s>" % (spmltag7) 227 if isinstance(spmllist7[2][0], list): 228 spmlxml8 = "" 229 for spmllist8 in spmllist7[2]: 230 spmltag8 = spmllist8[0] 231 spmlattrib8 = "" 232 bodyxml8 = "" 233 if spmllist8[1] != {}: 234 for key, value in spmllist8[1].items(): 235 spmlattrib8 += " %s='%s'" % (key, value) 236 startxml8 = "<%s%s>" % (spmltag8, spmlattrib8) 237 endxml8 = "</%s>" % (spmltag8) 238 if isinstance(spmllist8[2][0], list): 239 spmlxml9 = "" 240 for spmllist9 in spmllist8[2]: 241 spmltag9 = spmllist9[0] 242 spmlattrib9 = "" 243 bodyxml9 = "" 244 if spmllist9[1] != {}: 245 for key, value in spmllist9[1].items(): 246 spmlattrib9 += " %s='%s'" % (key, value) 247 startxml9 = "<%s%s>" % (spmltag9, spmlattrib9) 248 endxml9 = "</%s>" % (spmltag9) 249 if isinstance(spmllist9[2][0], list): 250 spmlxml10 = "" 251 for spmllist10 in spmllist9[2]: 252 spmltag10 = spmllist10[0] 253 spmlattrib10 = "" 254 bodyxml10 = "" 255 if spmllist10[1] != {}: 256 for key, value in spmllist10[1].items(): 257 spmlattrib10 += " %s='%s'" % ( 258 key, value) 259 startxml10 = "<%s%s>" % ( 260 spmltag10, spmlattrib10) 261 endxml10 = "</%s>" % (spmltag10) 262 bodyxml10 = spmllist10[2][0] 263 spmlxml10 += "\n%s%s%s%s" % ( 264 kong * 9, startxml10, bodyxml10, 265 endxml10) 266 spmlxml9 += "\n%s%s%s\n%s%s" % ( 267 kong * 8, startxml9, spmlxml10, kong * 8, 268 endxml9) 269 else: 270 bodyxml9 = spmllist9[2][0] 271 spmlxml9 += "\n%s%s%s%s" % ( 272 kong * 8, startxml9, bodyxml9, endxml9) 273 spmlxml8 += "\n%s%s%s\n%s%s" % ( 274 kong * 7, startxml8, spmlxml9, kong * 7, endxml8) 275 else: 276 bodyxml8 = spmllist8[2][0] 277 spmlxml8 += "\n%s%s%s%s" % ( 278 kong * 7, startxml8, bodyxml8, endxml8) 279 spmlxml7 += "\n%s%s%s\n%s%s" % ( 280 kong * 6, startxml7, spmlxml8, kong * 6, endxml7) 281 else: 282 bodyxml7 = spmllist7[2][0] 283 spmlxml7 += "\n%s%s%s%s" % ( 284 kong * 6, startxml7, bodyxml7, endxml7) 285 spmlxml6 += "\n%s%s%s\n%s%s" % ( 286 kong * 5, startxml6, spmlxml7, kong * 5, endxml6) 287 else: 288 bodyxml6 = spmllist6[2][0] 289 spmlxml6 += "\n%s%s%s%s" % (kong * 5, startxml6, bodyxml6, endxml6) 290 spmlxml5 += "\n%s%s%s\n%s%s" % ( 291 kong * 4, startxml5, spmlxml6, kong * 4, endxml5) 292 else: 293 bodyxml5 = spmllist5[2][0] 294 spmlxml5 += "\n%s%s%s%s" % (kong * 4, startxml5, bodyxml5, endxml5) 295 spmlxml4 += "\n%s%s%s\n%s%s" % (kong * 3, startxml4, spmlxml5, kong * 3, endxml4) 296 else: 297 bodyxml4 = spmllist4[2][0] 298 spmlxml4 += "\n%s%s%s%s" % (kong * 3, startxml4, bodyxml4, endxml4) 299 spmlxml3 += "\n%s%s%s\n%s%s" % (kong * 2, startxml3, spmlxml4, kong * 2, endxml3) 300 else: 301 bodyxml3 = spmllist3[2][0] 302 spmlxml3 += "\n%s%s%s%s" % (kong * 2, startxml3, bodyxml3, endxml3) 303 spmlxml2 += "\n%s%s%s\n%s%s" % (kong * 1, startxml2, spmlxml3, kong * 1, endxml2) 304 else: 305 bodyxml2 = spmllist2[2][0] 306 spmlxml2 += "\n%s%s%s%s" % (kong * 1, startxml2, bodyxml2, endxml2) 307 spmlxml1 += "\n%s%s\n%s" % (startxml1, spmlxml2, endxml1) 308 else: 309 bodyxml1 = spmllist1[2][0] 310 spmlxml1 += "\n%s%s%s" % (startxml1, bodyxml1, endxml1) 311 return spmlxml1 312 313 # 将startspml, xmlspml, endspml组合起来,其中有一部分内容需要根据实际情况处理 314 def regroupspml(startspml, xmlspml, endspml): 315 xmlspml = str(xmlspml).replace("{{", "").replace("}}", ":").strip().splitlines() 316 if endspml != "": 317 startspml = str(startspml.strip()).replace("\"", "\'") 318 startspml = re.sub(" +>", ">", startspml) 319 startspml = startspml.splitlines() 323 endspml = str(endspml.strip()).splitlines() 324 spmlxmlcom = startspml + xmlspml[1:-1] + endspml 325 else: 326 spmlxmlcom = xmlspml 327 return spmlxmlcom 328 329 # 对按序排列的xml进行内容比对,生成html文件,可以很直接的看出内容区别 330 def diffspml(spmlxml1, spmlxml2): 331 spmlxmllist1, spmlstart1, spmlend1 = xmltolist(spmlxml1) 332 spmlxmllist2, spmlstart2, spmlend2 = xmltolist(spmlxml2) 333 spmlxmlcom1 = listtoxml(spmlxmllist1) 334 spmlxmlcom2 = listtoxml(spmlxmllist2) 335 spmlxmlcom1 = regroupspml(spmlstart1, spmlxmlcom1, spmlend1) 336 spmlxmlcom2 = regroupspml(spmlstart2, spmlxmlcom2, spmlend2) 337 # print spmlstart1 338 # print spmlend1 339 if spmlxmlcom1 == spmlxmlcom2: 340 return 0 341 else: 342 global diffspmNum 343 global outputhtml_dir 344 try: 345 diffspmNum += 1 346 except: 347 diffspmNum = 1 348 system = platform.system() 349 if ('Windows' in system): 350 outputhtml_dir = "c:/RobotLog" 351 else: 352 outputhtml_dir = "/tmp/RobotLog" 353 outputhtml_dir = "%s/%s" % (outputhtml_dir, datetime.datetime.now().strftime('%Y%m%d_%H%M%S')) 354 os.makedirs(outputhtml_dir) 355 Loghtmldir = "%s/%s.html" % (outputhtml_dir, diffspmNum) 356 # logger.write("<a href=\"%s\">%s</a>" % (Loghtmldir, Loghtmldir), "HTML") 357 hd = difflib.HtmlDiff(8,65) 358 with open(Loghtmldir, 'w') as fo: 359 fo.write(hd.make_file(spmlxmlcom1, spmlxmlcom2)) 360 fo.close() 361 return Loghtmldir
#############################################以上是代码部分#################################################################
1 spmlxml1=''' 2 <PlexViewRequest SessionId="${sessionid}" ProvisioningGroup="volte" Command="ed-ngfs-subscriber-v2"><SubParty><PrimaryPUID>+86${ISDN}@${domain}</PrimaryPUID><PartyId>+86${ISDN}</PartyId></SubParty><SeqRinging><RingingList>null^null^true^true^10^false`+86${msisdn1}^STANDARD^true^true^10^false</RingingList><DefaultAnswerTimeout>10</DefaultAnswerTimeout><Send181Mode>TAS_181_NONE</Send181Mode><Activated>true</Activated><PublicUID>+86${ISDN}@${domain}</PublicUID><Assigned>true</Assigned></SeqRinging></PlexViewRequest> 3 ''' 4 spmlxml2=''' 5 <PlexViewRequest SessionId="${sessionid}" ProvisioningGroup="volte" Command="ed-ngfs-subscriber-v2"> 6 <SubParty> 7 <PrimaryPUID>+86${ISDN}@${domain}</PrimaryPUID> 8 <PartyId>+86${ISDN}</PartyId> 9 </SubParty> 10 <SeqRinging> 11 <RingingList>null^null^true^true^10^false`+86${msisdn1}^STANDARD^true^true^10^false</RingingList> 12 <DefaultAnswerTimeout>10</DefaultAnswerTimeout> 13 <Send181Mode>TAS_180_NONE</Send181Mode> 14 <Activated>true</Activated> 15 <PublicUID>+86${ISDN}@${domain}</PublicUID> 16 </SeqRinging> 17 </PlexViewRequest> 18 ''' 19 print diffspml(spmlxml1, spmlxml2)
#####################################以上列出来了本公司使用的xml格式(还可以更复杂),方法中有部分内容是根据本身需要,特别处理的####################################