pandas读写csv文件,及注意事项
程序员文章站
2022-04-07 17:29:28
...
写入csv文件
参考:https://www.jianshu.com/p/bda2f982c22b
import pandas as pd
from pandas import DataFrame,Series
def to_csv():
data = {"name":['google','baidu','yahoo'],"marks":[100,200,300],"price":[1,2,3]}
f1=DataFrame(data,columns=['name','price','marks'],index=['a','b','c'])
print(f1)
df=pd.DataFrame(data)
df.to_csv('pandas.csv',header=True,index=True)
if __name__ == "__main__":
to_csv()
print结果:
csv结果:
1,index为False:
df.to_csv('pandas.csv',header=True,index=False)
csv结果
2,header为False,index为False:
df.to_csv('pandas.csv',header=False,index=False)
csv结果
3,将f1的数据写入csv文件:
f1.to_csv('f1.csv',header=True,index=True)
csv结果:
4,将一个大的data中写入csv
思路:先定义好data格式,字段名称,里面全空的list。再将input_data一行一行的加入到data对应字段。导出到csv文件。
import pandas as pd
from pandas import DataFrame,Series
def to_csv(input_data):
data = {"author":[],
"org":[],
"title":[],
"origin":[],
"date":[],
"kw":[],
"abstract":[]
}
for index,one_sample in enumerate(input_data):
row = one_sample['_source']
data['author'].append(row['author'])
data['org'].append(row['org'])
data['title'].append(row['title'])
data['origin'].append(row['origin'])
data['date'].append(row['date'])
data['kw'].append(row['kw'])
data['abstract'].append(row['abstract'])
res=DataFrame(data, columns=["author","org","title","origin","date","kw","abstract"])
res.to_csv("res.csv", header=True, index=True)
csv结果:
读取csv文件
df = pd.read_csv("E:/1StudyData/f辅助阅读/src/res.csv")
1,报错,path中有中文
结果:
修改:
参考 https://blog.csdn.net/qq_35318838/article/details/80564938
import pandas as pd
def getContent(path):
df = pd.read_csv(open(path))
print(len(df))
print(df.head(2))
if __name__ == "__main__":
getContent("E:/1StudyData/f辅助阅读/src/res.csv")
结果
注意:如果编码出现问题,则需要加encoding,如open(path, encoding="utf-8")
2,read_csv()第一列作为index
参考: https://blog.csdn.net/a19990412/article/details/82734244
df = pd.read_csv(open(path), index_col=0)
结果:
(在实验时,推荐用jupyter,因为内存可以保存之前的变量值,而py脚本不能;另外,显示比较优雅,如下图)
操作dataframe
1,获取’abstract’字段的某个值
df['abstract'][0]
其他
1,统计中文字符数量:
参考 https://blog.csdn.net/xiamoyanyulrq/article/details/81504114
def str_count2(str):
count = 0
for s in str:
# 中文字符范围
if '\u4e00' <= s <= '\u9fff':
count += 1
return count
x = df['abstract'][0]
count = str_count2(x)
print(count)
2,按照字典value的逆序排列
tmp = sorted(usr_dict.items(), key = lambda kv:-int(kv[1]))
上一篇: Java实现黑客帝国特效