文本词频
程序员文章站
2022-04-27 10:43:20
![在这里插入图片描述![](https://img-blog.csdnimg.cn/20201111153524172.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl81MDUxMDkxNQ==,size_16,color_FFFFFF,t_70)...
用python会简单一点
def newdic(dicts,n):
list1 = sorted(dicts.items(),key=lambda x:x[1])
return list1[-1:-(n+1):-1]
f = open(r"E:\VS Code\us_constitution.txt","r", encoding="utf-8").read()
txt = f.lower().split()
dic = {}
for word in txt:
if word in dic:
dic[word] = dic[word]+1;
else:
dic[word]=1
del [dic['the'],dic['of'],dic['be'],dic['or'],dic['my'],dic['i'],dic['and'],dic['in'],dic['a'],dic['by'],dic['for'],dic['which'],
dic['any'],dic['such'],dic['as'],dic['have'],dic['on'],dic['he'],dic['is'],dic['from']]
print(newdic(dic,100))`
删除一些连接词之类的词汇,用了很傻很暴力的方法
- 对于C实现的一些想法:
由于无法预知存在多少词汇,所以动态分配是要的
类比python的dict,创建一个类似的结构体存放单词及相应的词频
本文地址:https://blog.csdn.net/weixin_50510915/article/details/109624172