欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  IT编程

基础向-Pandas数据分析的常用操作(1)

程序员文章站 2022-03-30 08:24:46
获取和了解数据导入必要的库import pandas as pdimport numpy as npimport json创建随机数字构成的DataFramedata = '{"\\u6b3e\\u53f7\\u7f16\\u7801":{"0":11059080069,"1":11059080070,"2":11059080070,"3":11059080071,"4":11059080071,"5":11059080071,"6":11059081050,"7":11059081050,"...

获取和了解数据

导入必要的库

import pandas as pd
import numpy as np
import json

创建随机数字构成的DataFrame

data = '{"\\u6b3e\\u53f7\\u7f16\\u7801":{"0":11059080069,"1":11059080070,"2":11059080070,"3":11059080071,"4":11059080071,"5":11059080071,"6":11059081050,"7":11059081050,"8":11059081052,"9":11059200049,"10":11059200130,"11":11059230015,"12":11059240017,"13":11059240022,"14":11059240034,"15":11059240108,"16":11059240108,"17":11059240118,"18":11059240118,"19":11059240120,"20":11059240120,"21":11059240127,"22":11059240127,"23":11059241002,"24":11059241002,"25":11059241003,"26":11059241003,"27":11059241015,"28":11059241015,"29":11059241022,"30":11059241023,"31":11059241031,"32":11059241031,"33":11059241038,"34":11059241038,"35":11059241067,"36":11059241079,"37":11059241098,"38":11059241098,"39":11059241103,"40":11059241103},"\\u8272\\u53f7\\u7f16\\u7801":{"0":800,"1":3445,"2":7480,"3":870,"4":3445,"5":7480,"6":3430,"7":9000,"8":820,"9":850,"10":890,"11":840,"12":840,"13":810,"14":840,"15":840,"16":1100,"17":840,"18":870,"19":820,"20":870,"21":840,"22":890,"23":830,"24":890,"25":820,"26":870,"27":822,"28":830,"29":820,"30":840,"31":820,"32":822,"33":850,"34":870,"35":850,"36":840,"37":840,"38":870,"39":820,"40":870},"\\u6700\\u65e9\\u5230\\u5e97\\u65e5\\u671f":{"0":20190819,"1":20190819,"2":20190819,"3":20190708,"4":20190708,"5":20190708,"6":20190708,"7":20190708,"8":20190819,"9":20190619,"10":20190619,"11":20190619,"12":20190708,"13":20190619,"14":20190708,"15":20190813,"16":20190813,"17":20190723,"18":20190723,"19":20190708,"20":20190708,"21":20190813,"22":20190813,"23":20190723,"24":20190723,"25":20190723,"26":20190723,"27":20190708,"28":20190708,"29":20190813,"30":20190708,"31":20190624,"32":20190624,"33":20190723,"34":20190723,"35":20190723,"36":20190813,"37":20170101,"38":20190708,"39":20190813,"40":20190813},"\\u5728\\u624b\\u5e93\\u5b58":{"0":15,"1":16,"2":14,"3":16,"4":17,"5":13,"6":17,"7":18,"8":18,"9":15,"10":0,"11":23,"12":26,"13":0,"14":24,"15":23,"16":24,"17":18,"18":15,"19":9,"20":26,"21":24,"22":24,"23":33,"24":18,"25":28,"26":29,"27":21,"28":28,"29":27,"30":40,"31":0,"32":19,"33":28,"34":28,"35":31,"36":27,"37":0,"38":25,"39":29,"40":27},"\\u5728\\u624b\\u6b3e\\u8272\\u6570":{"0":1,"1":1,"2":1,"3":1,"4":1,"5":1,"6":1,"7":1,"8":1,"9":1,"10":0,"11":1,"12":1,"13":0,"14":1,"15":1,"16":1,"17":1,"18":1,"19":1,"20":1,"21":1,"22":1,"23":1,"24":1,"25":1,"26":1,"27":1,"28":1,"29":1,"30":1,"31":0,"32":1,"33":1,"34":1,"35":1,"36":1,"37":0,"38":1,"39":1,"40":1},"14\\u5929\\u52a8\\u9500\\u6b3e\\u8272\\u6570":{"0":1.0,"1":1.0,"2":1.0,"3":null,"4":0.0,"5":1.0,"6":1.0,"7":0.0,"8":null,"9":1.0,"10":0.0,"11":1.0,"12":1.0,"13":0.0,"14":0.0,"15":1.0,"16":0.0,"17":1.0,"18":1.0,"19":1.0,"20":0.0,"21":0.0,"22":1.0,"23":1.0,"24":1.0,"25":1.0,"26":1.0,"27":1.0,"28":1.0,"29":1.0,"30":1.0,"31":0.0,"32":1.0,"33":null,"34":null,"35":1.0,"36":1.0,"37":0.0,"38":1.0,"39":1.0,"40":1.0},"14\\u5929\\u96f6\\u552e\\u91cf":{"0":1.0,"1":1.0,"2":2.0,"3":null,"4":0.0,"5":3.0,"6":1.0,"7":0.0,"8":null,"9":2.0,"10":0.0,"11":1.0,"12":2.0,"13":0.0,"14":0.0,"15":1.0,"16":0.0,"17":3.0,"18":7.0,"19":7.0,"20":0.0,"21":0.0,"22":1.0,"23":9.0,"24":6.0,"25":8.0,"26":1.0,"27":1.0,"28":1.0,"29":4.0,"30":3.0,"31":1.0,"32":4.0,"33":null,"34":null,"35":1.0,"36":1.0,"37":null,"38":1.0,"39":1.0,"40":1.0},"\\u96f6\\u552e\\u91d1\\u989d\\uff0814\\u5929\\uff09":{"0":"179.5","1":"242.2","2":"391.7","3":null,"4":"0","5":"647.5","6":"242.2","7":"0","8":null,"9":"257.6","10":"0","11":"125","12":"199","13":"0","14":"0","15":"193.6","16":"0","17":"493","18":"1,114.80","19":"1,139.50","20":"0","21":"0","22":"193.6","23":"1,874.70","24":"1,106.60","25":"1,320.80","26":"199","27":"145","28":"145","29":"682.6","30":"521.4","31":"199","32":"714.8","33":null,"34":null,"35":"274.6","36":"177.4","37":null,"38":"193.6","39":"177.4","40":"177.4"}}'
init_data = pd.read_json(data)

浏览前10行数据

款号编码 色号编码 最早到店日期 在手库存 在手款色数 14天动销款色数 14天零售量 零售金额(14天)
0 11059080069 800 20190819 15 1 1.0 1.0 179.5
1 11059080070 3445 20190819 16 1 1.0 1.0 242.2
2 11059080070 7480 20190819 14 1 1.0 2.0 391.7
3 11059080071 870 20190708 16 1 NaN NaN None
4 11059080071 3445 20190708 17 1 0.0 0.0 0
5 11059080071 7480 20190708 13 1 1.0 3.0 647.5
6 11059081050 3430 20190708 17 1 1.0 1.0 242.2
7 11059081050 9000 20190708 18 1 0.0 0.0 0
8 11059081052 820 20190819 18 1 NaN NaN None
9 11059200049 850 20190619 15 1 1.0 2.0 257.6

init_data.head(10)

DataFrame有多少行?

init_data.shape[0]

41

DataFrame有多少列?

init_data.shape[1]

8

输出DataFrame所有列名

init_data.columns.values.tolist()

[‘款号编码’, ‘色号编码’, ‘最早到店日期’, ‘在手库存’, ‘在手款色数’, ‘14天动销款色数’, ‘14天零售量’, ‘零售金额(14天)’]

DataFrame的索引构成是什么样子的?

init_data.index

Int64Index([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40],
dtype=‘int64’)

14天最大的零售量列是多少?

init_data['14天零售量'].max()

9.0

哪个款卖的最好?

方法1

init_data['零售金额(14天)'] = init_data['零售金额(14天)'].apply(lambda x:float(str(x).replace(',','')) if x != None else 0)
sale = init_data.groupby(['款号编码'])['零售金额(14天)'].sum().reset_index()
sale.sort_values(by=['零售金额(14天)'],ascending=False,inplace=True)
sale = sale.reset_index(drop=True)
sale['款号编码'][0]

11059241002

方法2

init_data['零售金额(14天)'] = init_data['零售金额(14天)'].apply(lambda x:float(str(x).replace(',','')) if x != None else 0)
init_data.loc[init_data['零售金额(14天)'] == init_data['零售金额(14天)'].max(),'款号编码'].tolist()[0]

11059241002

总零售金额是多少?

init_data['零售金额(14天)'].sum()

13328.5

DataFrame里一共有多少款?

init_data['款号编码'].nunique()

26

平均每个款有几个色?

init_data['款色'] = init_data['款号编码'] + init_data['色号编码']
init_data['款色'].nunique() / init_data['款号编码'].nunique()

1.5769230769230769

本文地址:https://blog.csdn.net/A_010001001110/article/details/107137299