pandas知识点(数据结构)
程序员文章站
2022-05-11 14:38:32
1.Series 生成一维数组,左边索引,右边值: In [3]: obj = Series([1,2,3,4,5]) In [4]: obj Out[4]: 0 1 1 2 2 3 3 4 4 5 dtype: int64 In [5]: obj.values Out[5]: array([1, ......
1.series
生成一维数组,左边索引,右边值:
in [3]: obj = series([1,2,3,4,5]) in [4]: obj out[4]: 0 1 1 2 2 3 3 4 4 5 dtype: int64 in [5]: obj.values out[5]: array([1, 2, 3, 4, 5], dtype=int64) in [6]: obj.index out[6]: rangeindex(start=0, stop=5, step=1)
创建对各个数据点进行标记的索引:
in [7]: obj2 = series([4,1,9,7], index=["a","c","e","ff"]) in [8]: obj2 out[8]: a 4 c 1 e 9 ff 7 dtype: int64 in [9]: obj2.index out[9]: index(['a', 'c', 'e', 'ff'], dtype='object')
取一个值或一组值:
in [10]: obj2["c"] out[10]: 1 in [11]: obj2[["c","e"]] out[11]: c 1 e 9 dtype: int64
数组运算,会显示索引:
in [12]: obj2[obj2>3] out[12]: a 4 e 9 ff 7 dtype: int64
series还可以看作有序的字典,很多字典操作可以使用:
in [13]: "c" in obj2 out[13]: true
直接用字典创建series:
in [14]: data = {"name":"liu","year":18,"sex":"man"} in [15]: obj3 = series(data) in [16]: obj3 out[16]: name liu year 18 sex man dtype: object
用字典结合列表创建series:
in [17]: list1 = ["name","year","mobile"] in [18]: obj4 = series(data,index=list1) in [19]: obj4 out[19]: name liu year 18 mobile nan dtype: object
ps:因为data字典中没有mobile所以值为nan
检测数据是否缺失:
in [20]: pd.isnull(obj4) out[20]: name false year false mobile true dtype: bool in [21]: pd.notnull(obj4) out[21]: name true year true mobile false dtype: bool in [22]: obj4.isnull() out[22]: name false year false mobile true dtype: bool in [23]: obj4.notnull() out[23]: name true year true mobile false dtype: bool
series的name属性:
in [7]: obj4.name = "hahaha" in [8]: obj4.index.name = "state" in [9]: obj4 out[9]: state name liu year 18 mobile nan name: hahaha, dtype: object
2.dataframe
构建dataframe
in [13]: data = { "state":[1,1,2,1,1], "year":[2000,2001,2002,2004,2005], "pop":[1.5,1.7,3.6,2.4,2.9] } in [14]: frame = dataframe(data) in [15]: frame out[15]: state year pop 0 1 2000 1.5 1 1 2001 1.7 2 2 2002 3.6 3 1 2004 2.4 4 1 2005 2.9
设定行与列的名称,如果数据找不到则产生na值:
in [18]: frame2 = dataframe( data, columns=["year","state","pop","debt"], index=["one","two","three","four","five"] ) in [19]: frame2 out[19]: year state pop debt one 2000 1 1.5 nan two 2001 1 1.7 nan three 2002 2 3.6 nan four 2004 1 2.4 nan five 2005 1 2.9 nan
将dataframe的列获取成为series:
in [7]: frame2.year out[7]: one 2000 two 2001 three 2002 four 2004 five 2005 name: year, dtype: int64
ps:返回的索引不变,且name属性被设置了
获取行:
in [11]: frame2.loc["three"] out[11]: year 2002 state 2 pop 3.6 debt nan name: three, dtype: object
赋值列:
in [12]: frame2['debt'] = 16.5 in [13]: frame2 out[13]: year state pop debt one 2000 1 1.5 16.5 two 2001 1 1.7 16.5 three 2002 2 3.6 16.5 four 2004 1 2.4 16.5 five 2005 1 2.9 16.5
如果赋值列表或数组,长度需要相等;如果赋值series,则精确匹配索引
in [17]: val = series([1.2,1.5,1.7], index=["two","four","five"]) in [18]: frame2['debt'] = val in [19]: frame2 out[19]: year state pop debt one 2000 1 1.5 nan two 2001 1 1.7 1.2 three 2002 2 3.6 nan four 2004 1 2.4 1.5 five 2005 1 2.9 1.7
如果列不存在,则创建:
in [21]: frame2["eastern"] = frame2.state == 1 in [22]: frame2 out[22]: year state pop debt eastern one 2000 1 1.5 nan true two 2001 1 1.7 1.2 true three 2002 2 3.6 nan false four 2004 1 2.4 1.5 true five 2005 1 2.9 1.7 true
对于嵌套字典,dataframe会解释为外层为列,内层为行索引:
in [23]: dic = {"name":{"one":"liu","two":"rui"},"year":{"one":"23","two":"22"}} in [24]: frame3 = dataframe(dic) in [25]: frame3 out[25]: name year one liu 23 two rui 22
显示行,列名:
in [26]: frame3.index.name = "index" in [27]: frame3.columns.name = "state" in [28]: frame3 out[28]: state name year index one liu 23 two rui 22
返回二维ndarray形式的数据:
in [29]: frame3.values out[29]: array([['liu', '23'], ['rui', '22']], dtype=object)
3.索引对象
in [30]: obj = series(range(3),index=["a","b","c"]) in [31]: index = obj.index in [32]: index out[32]: index(['a', 'b', 'c'], dtype='object')
index对象不可修改的,使得index在多个数据结构中可以共享
in [35]: index = pd.index(np.arange(3)) in [36]: obj2 = series([1.5,0.5,2],index=index) in [37]: obj2.index is index out[37]: true