数据分析第三篇——Pandas之数据的分割与聚合
程序员文章站
2024-03-07 14:41:15
...
3.10 高级处理-分割与聚合
3.10.1 什么是分组与聚合
3.10.2 分组与聚合的API
- DataFrame.groupby(by=分组依据的列标签, as_index=False)[聚合依据的DataFrame列标签]
- key:分组列的数据,可以有多个
- 案例:不同颜色不同笔的价格数据
- Series的方法:
- 聚合依据的DataFrame列.groupby(分组依据的DataFrame列标签)
col = pd.DataFrame({'color':['white', 'red', 'green', 'red', 'green'],
'object':['pen', 'pencil', 'pencil', 'ashtray', 'pen'],
'price1':[5.56, 4.20, 1.30, 0.56, 2.75],
'price2':[4.75, 4.12, 1.60, 0.75, 3.15]})
col
color | object | price1 | price2 | |
---|---|---|---|---|
0 | white | pen | 5.56 | 4.75 |
1 | red | pencil | 4.20 | 4.12 |
2 | green | pencil | 1.30 | 1.60 |
3 | red | ashtray | 0.56 | 0.75 |
4 | green | pen | 2.75 | 3.15 |
# 进行分组,对颜色分组,price1聚合
# 用DataFrame的方法分组
col.groupby(by='color')
col.groupby(by='color')['price1'].max()
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000001FD07361A48>
color
green 2.75
red 4.20
white 5.56
Name: price1, dtype: float64
# 用Series的方法进行分组
col['price1'].groupby(col['color']).max()
color
green 2.75
red 4.20
white 5.56
Name: price1, dtype: float64
3.10.3 星巴克零售店铺数据案例
starbucks = pd.read_csv(r'./data/directory.csv')
starbucks.head()
Brand | Store Number | Store Name | Ownership Type | Street Address | City | State/Province | Country | Postcode | Phone Number | Timezone | Longitude | Latitude | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Starbucks | 47370-257954 | Meritxell, 96 | Licensed | Av. Meritxell, 96 | Andorra la Vella | 7 | AD | AD500 | 376818720 | GMT+1:00 Europe/Andorra | 1.53 | 42.51 |
1 | Starbucks | 22331-212325 | Ajman Drive Thru | Licensed | 1 Street 69, Al Jarf | Ajman | AJ | AE | NaN | NaN | GMT+04:00 Asia/Dubai | 55.47 | 25.42 |
2 | Starbucks | 47089-256771 | Dana Mall | Licensed | Sheikh Khalifa Bin Zayed St. | Ajman | AJ | AE | NaN | NaN | GMT+04:00 Asia/Dubai | 55.47 | 25.39 |
3 | Starbucks | 22126-218024 | Twofour 54 | Licensed | Al Salam Street | Abu Dhabi | AZ | AE | NaN | NaN | GMT+04:00 Asia/Dubai | 54.38 | 24.48 |
4 | Starbucks | 17127-178586 | Al Ain Tower | Licensed | Khaldiya Area, Abu Dhabi Island | Abu Dhabi | AZ | AE | NaN | NaN | GMT+04:00 Asia/Dubai | 54.54 | 24.51 |
# 按国家分组,求出每个国家的星巴克零售店数量
starbucks.groupby(by='Country').count()['Brand']
Country
AD 1
AE 144
AR 108
AT 18
AU 22
...
TT 3
TW 394
US 13608
VN 25
ZA 3
Name: Brand, Length: 73, dtype: int64
starbucks.groupby(by='Country').count()['Brand'].sort_values(ascending=False)[:10].plot(kind='bar', figsize=(20, 8), fontsize=20)
按照国家和省市分组
starbucks.groupby(by=['Country', 'State/Province'])
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000001FD0D190188>
starbucks.groupby(by=['Country', 'State/Province']).count()
Brand | Store Number | Store Name | Ownership Type | Street Address | City | Postcode | Phone Number | Timezone | Longitude | Latitude | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Country | State/Province | |||||||||||
AD | 7 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
AE | AJ | 2 | 2 | 2 | 2 | 2 | 2 | 0 | 0 | 2 | 2 | 2 |
AZ | 48 | 48 | 48 | 48 | 48 | 48 | 7 | 20 | 48 | 48 | 48 | |
DU | 82 | 82 | 82 | 82 | 82 | 82 | 16 | 50 | 82 | 82 | 82 | |
FU | 2 | 2 | 2 | 2 | 2 | 2 | 1 | 0 | 2 | 2 | 2 | |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
US | WV | 25 | 25 | 25 | 25 | 25 | 25 | 25 | 23 | 25 | 25 | 25 |
WY | 23 | 23 | 23 | 23 | 23 | 23 | 23 | 22 | 23 | 23 | 23 | |
VN | HN | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 |
SG | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 17 | 19 | 19 | 19 | |
ZA | GT | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 2 | 3 | 3 | 3 |
545 rows × 11 columns
上一篇: 不利用第三个变量,交换两个变量的值