HTTP返回码中301与302的区别 博客分类: http
程序员文章站
2024-02-07 10:26:30
...
<p><span style="font-family: Arial; font-size: 14px; color: #333333; line-height: 26px;">
<p>支持向量机(Support Vector Machine)是Cortes和Vapnik于1995年首先提出的,它在解决<strong>小样本</strong> 、<strong>非线性</strong>及<strong>高维模式识别</strong> 中表现出许多特有的优势,并能够推广应用到函数拟合等其他机器学习问题中。支持向量机方法是建立在统计学习理论的<strong>VC 维理论和结构风险最小</strong> 原理基础上的,根据有限的样本信息在模型的复杂性(即对特定训练样本的学习精度,Accuracy)和学习能力(即无错误地识别任意样本的能力)之间寻求最佳折衷,以期获得最好的推广能力(或称泛化能力)。SVM理论的学习,请参考<a style="color: #336699; text-decoration: none;" href="http://www.blogjava.net/zhenandaci/category/31868.html">jasper的博客</a> 。</p>
<p> LIBSVM 是*大学林智仁(Chih-Jen Lin)博士等开发设计的一个操作简单、易于使用、快速有效的通用 SVM 软件包,可以解决分类问题(包括 C−SVC 、ν−SVC ), 回归问题(包括 ε − SVR 、v− SVR ) 以及分布估计(one − class − SVM ) 等问题,提供了线性、多项式、径向基和 S 形函数四种常用的核函数供选择,可以有效地解决多类问题、交叉验证选择参数、对不平衡样本加权、多类问题的概率估计等。<a style="color: #336699; text-decoration: none;" href="http://www.csie.ntu.edu.tw/~cjlin/libsvm/">LIBSVM</a> 是一个开源的软件包,。他不仅提供了 LIBSVM 的 C++语言的算法源代码,还提供了 Python、Java、R、MATLAB、Perl、Ruby、LabVIEW以及 C#.net 等各种语言的接口,可以方便的在 Windows 或 UNIX 平台下使用,也便于科研工作者根据自己的需要进行改进(譬如设计使用符合自己特定问题需要的核函数等)。</p>
<p> 文本分类,大致分为如下几件事情:<strong>样本</strong> ,<strong>分词</strong> ,<strong>特征提取</strong> ,<strong>向量计算</strong> ,<strong>分类训练</strong> ,<strong>测试和调试</strong> 。</p>
<p> </p>
<p><span style="font-size: medium;"><strong>1.样本选择</strong></span></p>
<p>搜狗语料 <a style="color: #336699; text-decoration: none;" href="http://www.sogou.com/labs/dl/c.html">http://www.sogou.com/labs/dl/c.html</a> ,下精简版吧,如果实验用用,这足够了,你要下107M的也可以。当然,你也可以自己找语料,不过麻烦点而已,把各大门户网站的对应频道下的文章都爬下来。</p>
<p> </p>
<p><span style="font-size: medium;">2.<strong>分词</strong></span></p>
<p>Bamboo分词,这是基于CRF++的分词模块,既然是研究统计学习,分词也得用基于统计的不是,如果还是用一字典来分词,那就太out啦。</p>
<p><a style="color: #336699; text-decoration: none;" href="http://code.google.com/p/nlpbamboo/wiki/GettingStarted">http://code.google.com/p/nlpbamboo/wiki/GettingStarted</a> 。安装完毕bamboo,还要下载训练好的模型(这个模型是基于人民日报1月语料)</p>
<p><a style="color: #336699; text-decoration: none;" href="http://code.google.com/p/nlpbamboo/downloads/list">http://code.google.com/p/nlpbamboo/downloads/list</a> ,下载index.tar.bz2, 解压到/opt/bamboo/index下。</p>
<p>因为咱主要目的是研究分类,不是分词,就不要去搞分词的训练了,如果想训练可以看我的另外一篇博客:<a style="color: #336699; text-decoration: none;" href="http://blog.csdn.net/marising/archive/2010/07/27/5769653.aspx">CRF++中文分词指南</a> 。</p>
<p>可以试试:/opt/bamboo/bin/bamboo -p crf_seg filename,如果成功证明装好了。</p>
<p> 稍微注意以下,搜狗的词库是gb2312的,所以,请转为utf8,再分词,这是python写的函数:输入一个文件名,转为utf8,再分词,分词文件以.seg为后缀。</p>
<div class="dp-highlighter bg_python" style="font-family: Consolas, 'Courier New', Courier, mono, serif; font-size: 12px; background-color: #e7e5dc; width: 687px; margin-top: 18px !important; margin-right: 0px !important; margin-bottom: 18px !important; margin-left: 0px !important; padding-top: 1px;">
<div class="bar" style="padding-left: 45px;">
<div class="tools" style=""><a class="ViewSource" style="" title="view plain" href="http://blog.csdn.net/marising/article/details/5844063#">view plain</a></div>
</div>
<ol class="dp-py" style="margin-top: 0px !important; margin-right: 0px !important; margin-bottom: 1px !important; margin-left: 45px !important; background-color: #ffffff; color: #5c5c5c; padding: 0px;">
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">def</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> seg(fn): </span></span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">if</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">not</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> os.path.isfile(fn+</span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'.utf8'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;">): </span></span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> cmd = <span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'iconv -f gb2312 -t utf8 -c %s > %s.utf8'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> %(fn,fn) </span></span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">print</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> cmd </span></span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> os.system(cmd) </span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> cmd = <span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'/opt/bamboo/bin/bamboo -p crf_seg %s.utf8 > %s.seg'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> % (fn,fn) </span></span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">print</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> cmd </span></span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> os.system(cmd) </span></li>
</ol>
</div>
<p> </p>
<p>分词结果如下:</p>
<p>一 家 刚刚 成立 两 年 的 网络 支付 公司 , 它 的 目标 是 成为 市值 100亿 美元 的 上市 公司 。<br>这家 公司 叫做 快 钱 , 说 这 句 话 的 是 快钱 的 CEO 关 国光 。 他 之前 曾 任 网易 的 高级 副 总裁 , 负责 过 网易 的 上市 工作 。 对于 为什么 选择 第三 方 支付 作为 创业 方向 , 他 曾经 对 媒体 这样 说 : “ 我 能 看到 这个 胡同 对面 是 什么 , 别人 只能 看到 这个 胡同 。 ” 自信 与 狂妄 只 有 一 步 之 遥 ―― 这 几乎 是 所有 创业者 的 共同 特征 , 是 自信 还是 狂妄 也许 需要 留待 时间 来 考证 。</p>
<p> </p>
<p><span style="font-size: medium;"><strong>3.特征提取</strong></span></p>
<p> svm不是在高维模式识别具有优势吗,咋还要特征提取呢,把所有词都当成特征不就行了吗?对于词库来说,十几万的词是很常见的,把对类别区分度(GDP,CPI,股票对经济类的区分度就高,其他一些高频词,如我们,大家,一起等就没有区分度)高的词挑选出来,一来可以减少计算量,二来应该是可以提高分类效果。</p>
<p> 据说,开方检验(CHI)信息增益(IG)对于挑选特征好,我选择的是CHI。两者的概念,请google。</p>
<p> 首先统计词在文档中的次数</p>
<p> </p>
<div class="dp-highlighter bg_python" style="font-family: Consolas, 'Courier New', Courier, mono, serif; font-size: 12px; background-color: #e7e5dc; width: 687px; margin-top: 18px !important; margin-right: 0px !important; margin-bottom: 18px !important; margin-left: 0px !important; padding-top: 1px;">
<div class="bar" style="padding-left: 45px;">
<div class="tools" style=""><a class="ViewSource" style="" title="view plain" href="http://blog.csdn.net/marising/article/details/5844063#">view plain</a></div>
</div>
<ol class="dp-py" style="margin-top: 0px !important; margin-right: 0px !important; margin-bottom: 1px !important; margin-left: 45px !important; background-color: #ffffff; color: #5c5c5c; padding: 0px;">
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"><span class="comment" style="color: #008200; background-color: inherit; padding: 0px; margin: 0px;">#ingore some term</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span></span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">def</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> ingore(s): </span></span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">return</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'nbsp'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">' '</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">' '</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'/t'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'/n'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> / </span></span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">','</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'。'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'!'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'、'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'―'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;">/ </span></span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'?'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'@'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">':'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> / </span></span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'#'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'%'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'&'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> / </span></span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'('</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">')'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'《'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'》'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> / </span></span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'['</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">']'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'{'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'}'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> / </span></span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'*'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">','</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'.'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'&'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> / </span></span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'!'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'?'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">':'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">';'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;">/ </span></span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'-'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'&'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;">/ </span></span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'<'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'>'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'('</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">')'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> / </span></span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'['</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">']'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'{'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'}'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span></span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"><span class="comment" style="color: #008200; background-color: inherit; padding: 0px; margin: 0px;">#term times</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span></span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">def</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> getterm(fn): </span></span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> fnobj = open(fn,<span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'r'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;">) </span></span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> data = fnobj.read() </span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> fnobj.close() </span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> arr = data.split(<span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">' '</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;">) </span></span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> docterm = dict() </span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">for</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> a </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">in</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> arr: </span></span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> a = a.strip(<span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">' /n/t'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;">) </span></span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">if</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">not</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> ingore(a) </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">and</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> len( a.decode(</span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'utf-8'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;">)) >=</span><span class="number" style="color: black; background-color: inherit; padding: 0px; margin: 0px;">2</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;">: </span></span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> times = docterm.get(a) </span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">if</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> times: </span></span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> docterm[a] = times + <span class="number" style="color: black; background-color: inherit; padding: 0px; margin: 0px;">1</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span></span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">else</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;">: </span></span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> docterm[a] = <span class="number" style="color: black; background-color: inherit; padding: 0px; margin: 0px;">1</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span></span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">return</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> docte </span></span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"><span class="comment" style="color: #008200; background-color: inherit; padding: 0px; margin: 0px;">#cls_term:cls,term,artcount</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span></span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"><span class="comment" style="color: #008200; background-color: inherit; padding: 0px; margin: 0px;">#term_cls:term,cls,artcount</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span></span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">def</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> stat(</span><span class="special" style="color: black; background-color: inherit; padding: 0px; margin: 0px;">cls</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;">,fn,cls_term,term_cls): </span></span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> docterm = getterm(fn) </span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> termdi = cls_term.get(<span class="special" style="color: black; background-color: inherit; padding: 0px; margin: 0px;">cls</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;">) </span></span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">if</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">not</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> termdi: </span></span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> termdi = dict() </span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> cls_term[<span class="special" style="color: black; background-color: inherit; padding: 0px; margin: 0px;">cls</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;">] = termdi </span></span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="comment" style="color: #008200; background-color: inherit; padding: 0px; margin: 0px;">#term,times</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span></span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">for</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> t </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">in</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> docterm.iterkeys(): </span></span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> artcount = termdi.get(t) </span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">if</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">not</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> artcount: </span></span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> artcount = <span class="number" style="color: black; background-color: inherit; padding: 0px; margin: 0px;">0</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span></span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> termdi[k] = artcount + <span class="number" style="color: black; background-color: inherit; padding: 0px; margin: 0px;">1</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span></span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> clsdi = term_cls.get(t) </span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">if</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">not</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> clsdi: </span></span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> clsdi = {} </span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> term_cls[k] = clsdi </span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> artcount = clsdi.get(<span class="special" style="color: black; background-color: inherit; padding: 0px; margin: 0px;">cls</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;">) </span></span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">if</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">not</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> artcount: </span></span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-ri
<p>支持向量机(Support Vector Machine)是Cortes和Vapnik于1995年首先提出的,它在解决<strong>小样本</strong> 、<strong>非线性</strong>及<strong>高维模式识别</strong> 中表现出许多特有的优势,并能够推广应用到函数拟合等其他机器学习问题中。支持向量机方法是建立在统计学习理论的<strong>VC 维理论和结构风险最小</strong> 原理基础上的,根据有限的样本信息在模型的复杂性(即对特定训练样本的学习精度,Accuracy)和学习能力(即无错误地识别任意样本的能力)之间寻求最佳折衷,以期获得最好的推广能力(或称泛化能力)。SVM理论的学习,请参考<a style="color: #336699; text-decoration: none;" href="http://www.blogjava.net/zhenandaci/category/31868.html">jasper的博客</a> 。</p>
<p> LIBSVM 是*大学林智仁(Chih-Jen Lin)博士等开发设计的一个操作简单、易于使用、快速有效的通用 SVM 软件包,可以解决分类问题(包括 C−SVC 、ν−SVC ), 回归问题(包括 ε − SVR 、v− SVR ) 以及分布估计(one − class − SVM ) 等问题,提供了线性、多项式、径向基和 S 形函数四种常用的核函数供选择,可以有效地解决多类问题、交叉验证选择参数、对不平衡样本加权、多类问题的概率估计等。<a style="color: #336699; text-decoration: none;" href="http://www.csie.ntu.edu.tw/~cjlin/libsvm/">LIBSVM</a> 是一个开源的软件包,。他不仅提供了 LIBSVM 的 C++语言的算法源代码,还提供了 Python、Java、R、MATLAB、Perl、Ruby、LabVIEW以及 C#.net 等各种语言的接口,可以方便的在 Windows 或 UNIX 平台下使用,也便于科研工作者根据自己的需要进行改进(譬如设计使用符合自己特定问题需要的核函数等)。</p>
<p> 文本分类,大致分为如下几件事情:<strong>样本</strong> ,<strong>分词</strong> ,<strong>特征提取</strong> ,<strong>向量计算</strong> ,<strong>分类训练</strong> ,<strong>测试和调试</strong> 。</p>
<p> </p>
<p><span style="font-size: medium;"><strong>1.样本选择</strong></span></p>
<p>搜狗语料 <a style="color: #336699; text-decoration: none;" href="http://www.sogou.com/labs/dl/c.html">http://www.sogou.com/labs/dl/c.html</a> ,下精简版吧,如果实验用用,这足够了,你要下107M的也可以。当然,你也可以自己找语料,不过麻烦点而已,把各大门户网站的对应频道下的文章都爬下来。</p>
<p> </p>
<p><span style="font-size: medium;">2.<strong>分词</strong></span></p>
<p>Bamboo分词,这是基于CRF++的分词模块,既然是研究统计学习,分词也得用基于统计的不是,如果还是用一字典来分词,那就太out啦。</p>
<p><a style="color: #336699; text-decoration: none;" href="http://code.google.com/p/nlpbamboo/wiki/GettingStarted">http://code.google.com/p/nlpbamboo/wiki/GettingStarted</a> 。安装完毕bamboo,还要下载训练好的模型(这个模型是基于人民日报1月语料)</p>
<p><a style="color: #336699; text-decoration: none;" href="http://code.google.com/p/nlpbamboo/downloads/list">http://code.google.com/p/nlpbamboo/downloads/list</a> ,下载index.tar.bz2, 解压到/opt/bamboo/index下。</p>
<p>因为咱主要目的是研究分类,不是分词,就不要去搞分词的训练了,如果想训练可以看我的另外一篇博客:<a style="color: #336699; text-decoration: none;" href="http://blog.csdn.net/marising/archive/2010/07/27/5769653.aspx">CRF++中文分词指南</a> 。</p>
<p>可以试试:/opt/bamboo/bin/bamboo -p crf_seg filename,如果成功证明装好了。</p>
<p> 稍微注意以下,搜狗的词库是gb2312的,所以,请转为utf8,再分词,这是python写的函数:输入一个文件名,转为utf8,再分词,分词文件以.seg为后缀。</p>
<div class="dp-highlighter bg_python" style="font-family: Consolas, 'Courier New', Courier, mono, serif; font-size: 12px; background-color: #e7e5dc; width: 687px; margin-top: 18px !important; margin-right: 0px !important; margin-bottom: 18px !important; margin-left: 0px !important; padding-top: 1px;">
<div class="bar" style="padding-left: 45px;">
<div class="tools" style=""><a class="ViewSource" style="" title="view plain" href="http://blog.csdn.net/marising/article/details/5844063#">view plain</a></div>
</div>
<ol class="dp-py" style="margin-top: 0px !important; margin-right: 0px !important; margin-bottom: 1px !important; margin-left: 45px !important; background-color: #ffffff; color: #5c5c5c; padding: 0px;">
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">def</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> seg(fn): </span></span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">if</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">not</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> os.path.isfile(fn+</span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'.utf8'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;">): </span></span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> cmd = <span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'iconv -f gb2312 -t utf8 -c %s > %s.utf8'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> %(fn,fn) </span></span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">print</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> cmd </span></span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> os.system(cmd) </span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> cmd = <span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'/opt/bamboo/bin/bamboo -p crf_seg %s.utf8 > %s.seg'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> % (fn,fn) </span></span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">print</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> cmd </span></span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> os.system(cmd) </span></li>
</ol>
</div>
<p> </p>
<p>分词结果如下:</p>
<p>一 家 刚刚 成立 两 年 的 网络 支付 公司 , 它 的 目标 是 成为 市值 100亿 美元 的 上市 公司 。<br>这家 公司 叫做 快 钱 , 说 这 句 话 的 是 快钱 的 CEO 关 国光 。 他 之前 曾 任 网易 的 高级 副 总裁 , 负责 过 网易 的 上市 工作 。 对于 为什么 选择 第三 方 支付 作为 创业 方向 , 他 曾经 对 媒体 这样 说 : “ 我 能 看到 这个 胡同 对面 是 什么 , 别人 只能 看到 这个 胡同 。 ” 自信 与 狂妄 只 有 一 步 之 遥 ―― 这 几乎 是 所有 创业者 的 共同 特征 , 是 自信 还是 狂妄 也许 需要 留待 时间 来 考证 。</p>
<p> </p>
<p><span style="font-size: medium;"><strong>3.特征提取</strong></span></p>
<p> svm不是在高维模式识别具有优势吗,咋还要特征提取呢,把所有词都当成特征不就行了吗?对于词库来说,十几万的词是很常见的,把对类别区分度(GDP,CPI,股票对经济类的区分度就高,其他一些高频词,如我们,大家,一起等就没有区分度)高的词挑选出来,一来可以减少计算量,二来应该是可以提高分类效果。</p>
<p> 据说,开方检验(CHI)信息增益(IG)对于挑选特征好,我选择的是CHI。两者的概念,请google。</p>
<p> 首先统计词在文档中的次数</p>
<p> </p>
<div class="dp-highlighter bg_python" style="font-family: Consolas, 'Courier New', Courier, mono, serif; font-size: 12px; background-color: #e7e5dc; width: 687px; margin-top: 18px !important; margin-right: 0px !important; margin-bottom: 18px !important; margin-left: 0px !important; padding-top: 1px;">
<div class="bar" style="padding-left: 45px;">
<div class="tools" style=""><a class="ViewSource" style="" title="view plain" href="http://blog.csdn.net/marising/article/details/5844063#">view plain</a></div>
</div>
<ol class="dp-py" style="margin-top: 0px !important; margin-right: 0px !important; margin-bottom: 1px !important; margin-left: 45px !important; background-color: #ffffff; color: #5c5c5c; padding: 0px;">
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"><span class="comment" style="color: #008200; background-color: inherit; padding: 0px; margin: 0px;">#ingore some term</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span></span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">def</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> ingore(s): </span></span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">return</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'nbsp'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">' '</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">' '</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'/t'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'/n'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> / </span></span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">','</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'。'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'!'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'、'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'―'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;">/ </span></span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'?'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'@'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">':'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> / </span></span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'#'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'%'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'&'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> / </span></span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'('</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">')'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'《'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'》'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> / </span></span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'['</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">']'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'{'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'}'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> / </span></span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'*'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">','</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'.'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'&'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> / </span></span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'!'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'?'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">':'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">';'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;">/ </span></span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'-'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'&'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;">/ </span></span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'<'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'>'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'('</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">')'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> / </span></span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'['</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">']'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'{'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">or</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> s == </span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'}'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span></span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"><span class="comment" style="color: #008200; background-color: inherit; padding: 0px; margin: 0px;">#term times</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span></span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">def</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> getterm(fn): </span></span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> fnobj = open(fn,<span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'r'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;">) </span></span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> data = fnobj.read() </span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> fnobj.close() </span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> arr = data.split(<span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">' '</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;">) </span></span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> docterm = dict() </span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">for</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> a </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">in</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> arr: </span></span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> a = a.strip(<span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">' /n/t'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;">) </span></span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">if</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">not</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> ingore(a) </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">and</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> len( a.decode(</span><span class="string" style="color: blue; background-color: inherit; padding: 0px; margin: 0px;">'utf-8'</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;">)) >=</span><span class="number" style="color: black; background-color: inherit; padding: 0px; margin: 0px;">2</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;">: </span></span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> times = docterm.get(a) </span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">if</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> times: </span></span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> docterm[a] = times + <span class="number" style="color: black; background-color: inherit; padding: 0px; margin: 0px;">1</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span></span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">else</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;">: </span></span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> docterm[a] = <span class="number" style="color: black; background-color: inherit; padding: 0px; margin: 0px;">1</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span></span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">return</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> docte </span></span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"><span class="comment" style="color: #008200; background-color: inherit; padding: 0px; margin: 0px;">#cls_term:cls,term,artcount</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span></span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"><span class="comment" style="color: #008200; background-color: inherit; padding: 0px; margin: 0px;">#term_cls:term,cls,artcount</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span></span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">def</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> stat(</span><span class="special" style="color: black; background-color: inherit; padding: 0px; margin: 0px;">cls</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;">,fn,cls_term,term_cls): </span></span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> docterm = getterm(fn) </span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> termdi = cls_term.get(<span class="special" style="color: black; background-color: inherit; padding: 0px; margin: 0px;">cls</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;">) </span></span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">if</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">not</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> termdi: </span></span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> termdi = dict() </span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> cls_term[<span class="special" style="color: black; background-color: inherit; padding: 0px; margin: 0px;">cls</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;">] = termdi </span></span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="comment" style="color: #008200; background-color: inherit; padding: 0px; margin: 0px;">#term,times</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span></span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">for</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> t </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">in</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> docterm.iterkeys(): </span></span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> artcount = termdi.get(t) </span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">if</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">not</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> artcount: </span></span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> artcount = <span class="number" style="color: black; background-color: inherit; padding: 0px; margin: 0px;">0</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span></span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> termdi[k] = artcount + <span class="number" style="color: black; background-color: inherit; padding: 0px; margin: 0px;">1</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span></span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> clsdi = term_cls.get(t) </span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">if</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">not</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> clsdi: </span></span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> clsdi = {} </span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> term_cls[k] = clsdi </span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #ffffff; color: inherit; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> artcount = clsdi.get(<span class="special" style="color: black; background-color: inherit; padding: 0px; margin: 0px;">cls</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;">) </span></span></li>
<li style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: solid; border-color: initial; border-left-width: 3px; border-left-color: #6ce26c; background-color: #f8f8f8; color: #5c5c5c; line-height: 18px; margin: 0px !important;"><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> <span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">if</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> </span><span class="keyword" style="color: #006699; background-color: inherit; font-weight: bold; padding: 0px; margin: 0px;">not</span><span style="color: black; background-color: inherit; padding: 0px; margin: 0px;"> artcount: </span></span></li>
<li class="alt" style="padding-top: 0px !important; padding-right: 3px !important; padding-bottom: 0px !important; padding-left: 10px !important; border-top-style: none; border-ri