Datawhale_day2
程序员文章站
2022-07-14 23:11:47
...
本章作业
- 假设字符3750,字符900和字符648是句子的标点符号,请分析赛题每篇新闻平均由多少个句子构成?
- 统计每类新闻中出现次数对多的字符
————————————————————————————————————————————
题1 代码:
import pandas as pd
import os
data_set = os.path.join(os.getcwd(), "数据集\\train_set.csv\\train_set.csv")
print(data_set)
train_df = pd.read_csv(data_set, sep='\t')
sum_sentences, lines = 0, 0
for index, content in enumerate(train_df["text"]):
num_array = content.split(" ")
num_dict = {}
lines += 1
for key in num_array:
num_dict[key] = num_dict.get(key, 0) + 1
if "3750" not in num_dict.keys() and "900" not in num_dict.keys() and "648" not in num_dict.keys():
sum_sentences += 1
else:
sum_sentences += num_dict["3750"] if "3750" in num_dict.keys() else 0 + num_dict["900"] if "900" in num_dict.keys() else 0 + num_dict["648"] if "648" in num_dict.keys() else 0
print(sum_sentences/lines)
————————————————————————————————————————————
题2 思路:
使用dict计算每个字符出现的频率,选每一类中最大的即可
推荐阅读