python正向最大匹配分词和逆向最大匹配分词的实例
程序员文章站
2022-06-08 21:58:10
正向最大匹配
# -*- coding:utf-8 -*-
codec='utf-8'
def u(s, encoding):
'conve...
正向最大匹配
# -*- coding:utf-8 -*- codec='utf-8' def u(s, encoding): 'converted other encoding to unicode encoding' if isinstance(s, unicode): return s else: return unicode(s, encoding) def fwd_mm_seg(worddict, maxlen, str): 'forward max match segment' wordlist = [] segstr = str segstrlen = len(segstr) for word in worddict: print 'word: ', word print "\n" while segstrlen > 0: if segstrlen > maxlen: wordlen = maxlen else: wordlen = segstrlen substr = segstr[0:wordlen] print "substr: ", substr while wordlen > 1: if substr in worddict: print "substr1: %r" % substr break else: print "substr2: %r" % substr wordlen = wordlen - 1 substr = substr[0:wordlen] # print "substr3: ", substr wordlist.append(substr) segstr = segstr[wordlen:] segstrlen = segstrlen - wordlen for wordstr in wordlist: print "wordstr: ", wordstr return wordlist def main(): fp_dict = open('words.dic') worddict = {} for eachword in fp_dict: worddict[u(eachword.strip(), 'utf-8')] = 1 segstr = u'你好世界hello world' print segstr wordlist = fwd_mm_seg(worddict, 10, segstr) print "==".join(wordlist) if __name__ == '__main__': main()
逆向最大匹配
# -*- coding:utf-8 -*- def u(s, encoding): 'converted other encoding to unicode encoding' if isinstance(s, unicode): return s else: return unicode(s, encoding) codec='utf-8' def bwd_mm_seg(worddict, maxlen, str): 'forward max match segment' wordlist = [] segstr = str segstrlen = len(segstr) for word in worddict: print 'word: ', word print "\n" while segstrlen > 0: if segstrlen > maxlen: wordlen = maxlen else: wordlen = segstrlen substr = segstr[-wordlen:none] print "substr: ", substr while wordlen > 1: if substr in worddict: print "substr1: %r" % substr break else: print "substr2: %r" % substr wordlen = wordlen - 1 substr = substr[-wordlen:none] # print "substr3: ", substr wordlist.append(substr) segstr = segstr[0: -wordlen] segstrlen = segstrlen - wordlen wordlist.reverse() for wordstr in wordlist: print "wordstr: ", wordstr return wordlist def main(): fp_dict = open('words.dic') worddict = {} for eachword in fp_dict: worddict[u(eachword.strip(), 'utf-8')] = 1 segstr = ur'你好世界hello world' print segstr wordlist = bwd_mm_seg(worddict, 10, segstr) print "==".join(wordlist) if __name__ == '__main__': main()
以上这篇python正向最大匹配分词和逆向最大匹配分词的实例就是小编分享给大家的全部内容了,希望能给大家一个参考,也希望大家多多支持。
上一篇: SVN Eclipse 下的配置方式
推荐阅读
-
python中文分词教程之前向最大正向匹配算法详解
-
python正向最大匹配分词和逆向最大匹配分词的实例
-
python实现机械分词之逆向最大匹配算法代码示例
-
用python实现前向分词最大匹配算法的示例代码
-
【NLP】中文分词方法:规则分词(正向最大匹配、逆向最大匹配、双向最大匹配)
-
Python实现 代码 双向最大匹配法 规则分词 正向最大匹配法 逆向最大匹配法 中文分词技术
-
NLP学习(四)规则分词-正向、逆向和双向最大匹配算法的中文分词-python3实现
-
python正向最大匹配分词和逆向最大匹配分词的实例
-
中文分词中的正向最大匹配与逆向最大匹配
-
用python实现前向分词最大匹配算法的示例代码