欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

数据分析第三篇——Pandas之数据的分割与聚合

程序员文章站 2024-03-07 14:41:15
...

3.10 高级处理-分割与聚合

3.10.1 什么是分组与聚合

数据分析第三篇——Pandas之数据的分割与聚合

3.10.2 分组与聚合的API

  • DataFrame.groupby(by=分组依据的列标签, as_index=False)[聚合依据的DataFrame列标签]
    • key:分组列的数据,可以有多个
  • 案例:不同颜色不同笔的价格数据
  • Series的方法:
    • 聚合依据的DataFrame列.groupby(分组依据的DataFrame列标签)
col = pd.DataFrame({'color':['white', 'red', 'green', 'red', 'green'],
                   'object':['pen', 'pencil', 'pencil', 'ashtray', 'pen'],
                   'price1':[5.56, 4.20, 1.30, 0.56, 2.75],
                   'price2':[4.75, 4.12, 1.60, 0.75, 3.15]})
col
color object price1 price2
0 white pen 5.56 4.75
1 red pencil 4.20 4.12
2 green pencil 1.30 1.60
3 red ashtray 0.56 0.75
4 green pen 2.75 3.15
# 进行分组,对颜色分组,price1聚合
# 用DataFrame的方法分组
col.groupby(by='color')
col.groupby(by='color')['price1'].max()
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000001FD07361A48>


color
green    2.75
red      4.20
white    5.56
Name: price1, dtype: float64
# 用Series的方法进行分组
col['price1'].groupby(col['color']).max()
color
green    2.75
red      4.20
white    5.56
Name: price1, dtype: float64

3.10.3 星巴克零售店铺数据案例

starbucks = pd.read_csv(r'./data/directory.csv')
starbucks.head()
Brand Store Number Store Name Ownership Type Street Address City State/Province Country Postcode Phone Number Timezone Longitude Latitude
0 Starbucks 47370-257954 Meritxell, 96 Licensed Av. Meritxell, 96 Andorra la Vella 7 AD AD500 376818720 GMT+1:00 Europe/Andorra 1.53 42.51
1 Starbucks 22331-212325 Ajman Drive Thru Licensed 1 Street 69, Al Jarf Ajman AJ AE NaN NaN GMT+04:00 Asia/Dubai 55.47 25.42
2 Starbucks 47089-256771 Dana Mall Licensed Sheikh Khalifa Bin Zayed St. Ajman AJ AE NaN NaN GMT+04:00 Asia/Dubai 55.47 25.39
3 Starbucks 22126-218024 Twofour 54 Licensed Al Salam Street Abu Dhabi AZ AE NaN NaN GMT+04:00 Asia/Dubai 54.38 24.48
4 Starbucks 17127-178586 Al Ain Tower Licensed Khaldiya Area, Abu Dhabi Island Abu Dhabi AZ AE NaN NaN GMT+04:00 Asia/Dubai 54.54 24.51

# 按国家分组,求出每个国家的星巴克零售店数量

starbucks.groupby(by='Country').count()['Brand']
Country
AD        1
AE      144
AR      108
AT       18
AU       22
      ...  
TT        3
TW      394
US    13608
VN       25
ZA        3
Name: Brand, Length: 73, dtype: int64
starbucks.groupby(by='Country').count()['Brand'].sort_values(ascending=False)[:10].plot(kind='bar', figsize=(20, 8), fontsize=20)

数据分析第三篇——Pandas之数据的分割与聚合

按照国家和省市分组

starbucks.groupby(by=['Country', 'State/Province'])
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000001FD0D190188>
starbucks.groupby(by=['Country', 'State/Province']).count()
Brand Store Number Store Name Ownership Type Street Address City Postcode Phone Number Timezone Longitude Latitude
Country State/Province
AD 7 1 1 1 1 1 1 1 1 1 1 1
AE AJ 2 2 2 2 2 2 0 0 2 2 2
AZ 48 48 48 48 48 48 7 20 48 48 48
DU 82 82 82 82 82 82 16 50 82 82 82
FU 2 2 2 2 2 2 1 0 2 2 2
... ... ... ... ... ... ... ... ... ... ... ... ...
US WV 25 25 25 25 25 25 25 23 25 25 25
WY 23 23 23 23 23 23 23 22 23 23 23
VN HN 6 6 6 6 6 6 6 6 6 6 6
SG 19 19 19 19 19 19 19 17 19 19 19
ZA GT 3 3 3 3 3 3 3 2 3 3 3

545 rows × 11 columns