欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

20200915电商数据分析 - 偏逻辑思维(一)

程序员文章站 2022-04-06 23:37:24
...

这是一个电商数据分析题目,可以用excel做,可以用sql做,可以用python做

题目如下:
20200915电商数据分析 - 偏逻辑思维(一)
数据源如下:
链接:https://pan.baidu.com/s/1gM2NoS_1SxUCgPbf-9mgYg
提取码:9zrs

Excel篇:其他人用数据透析表做的
SQL篇:我遇到一些问题,重新用pyhton做
Python篇:如下

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
data = pd.read_csv('数据源.csv')
C:\ProgramData\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py:3057: DtypeWarning: Columns (1,2,3,4,7,8) have mixed types. Specify dtype option on import or set low_memory=False.
  interactivity=interactivity, compiler=compiler, result=result)
df = data.copy()
data.dropna(how="all",inplace=True)  #删除全部是Null值的行
data.drop_duplicates(inplace=True)  #删除重复行
data.head()
id week date order_d order_e quantity amount phone account
0 1.0 NaN 2017/6/8 GU20170608923396-1 GU20170608923396 1.0 2750.0 1500026566 aaa@qq.com
1 2.0 NaN 2017/6/8 GU20170608923428-1 GU20170608923428 1.0 2250.0 16201251750 aaa@qq.com
2 3.0 NaN 2017/6/8 GU20170608923430-1 GU20170608923430 1.0 3900.0 15010529091 aaa@qq.com
3 4.0 NaN 2017/6/8 GU20170608923431-1 GU20170608923431 1.0 3300.0 12621601625 aaa@qq.com
4 5.0 NaN 2017/6/8 GU20170608923435-1 GU20170608923435 1.0 2250.0 12917062526 aaa@qq.com
data.date = pd.to_datetime(df.date,format='%Y-%m-%d')
data.phone = df.phone.astype('str')
data.describe()
id quantity amount
count 293657.000000 293657.000000 293657.000000
mean 146829.000000 1.008050 4533.377958
std 84771.618337 0.259776 3904.373436
min 1.000000 1.000000 350.000000
25% 73415.000000 1.000000 1750.000000
50% 146829.000000 1.000000 3650.000000
75% 220243.000000 1.000000 6450.000000
max 293657.000000 62.000000 71500.000000
len(data)
293657
data.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 293657 entries, 0 to 293656
Data columns (total 9 columns):
id          293657 non-null float64
week        5491 non-null object
date        293657 non-null datetime64[ns]
order_d     293657 non-null object
order_e     293657 non-null object
quantity    293657 non-null float64
amount      293657 non-null float64
phone       293657 non-null object
account     254716 non-null object
dtypes: datetime64[ns](1), float64(3), object(5)
memory usage: 22.4+ MB
data.isnull().sum()  #查看列缺失值
id               0
week        288166
date             0
order_d          0
order_e          0
quantity         0
amount           0
phone            0
account      38941
dtype: int64
  • 列account有缺失值,Week只有本周有数值,之前没有数值
len(data.account.value_counts()) #唯一account一共有199179
199179
len(data.phone.value_counts()) #唯一phone一共有234490,这里phone表示下单,account表示注册,下单用户大于注册用户正常
234490
df_week = data[data.week == 'Sep W36']   #Sep W36一共有5491条数据
len(df_week.phone.value_counts())  #Sep W36品牌周,一共有5054个用户购买,这里以一个手机号是一个用户来统计用户个数
5054
len(df_week)
5491
len(df_history)
288166
df_history = data[data.week != 'Sep W36']#品牌周之前的数据,一共288166条消费记录
len(df_history.phone.value_counts())
230260

第一部分:

  • 本周新客人数(开店至今本周第一次购买):4230人
  • 本周一共有5491条消费记录,用户5054人,新客4230,老客824人,新客消费4483条,老客消费1008条
  • 本周新客消费记录保存在df_week,本周老客消费记录保存在df_history
df2 = pd.merge(df_week,df_history,how='left',on='phone') #_x为week的数据,_y为history数据
df2.head()
id_x week_x date_x order_d_x order_e_x quantity_x amount_x phone account_x id_y week_y date_y order_d_y order_e_y quantity_y amount_y account_y
0 288167.0 Sep W36 2020-08-30 GU202008291446552-1 GU202008291446552 1.0 6900.0 15120002542 15120002542 NaN NaN NaT NaN NaN NaN NaN NaN
1 288168.0 Sep W36 2020-08-30 GU202008291446404-1 GU202008291446404 1.0 10200.0 15941716566 15714177539 NaN NaN NaT NaN NaN NaN NaN NaN
2 288169.0 Sep W36 2020-08-30 GU202008291447049-1 GU202008291447049 1.0 6900.0 12721266779 12721236779 NaN NaN NaT NaN NaN NaN NaN NaN
3 288170.0 Sep W36 2020-08-30 GU202008291446576-1 GU202008291446576 1.0 2900.0 16091666427 13091333427 NaN NaN NaT NaN NaN NaN NaN NaN
4 288171.0 Sep W36 2020-08-30 GU202008291446393-1 GU202008291446393 1.0 14300.0 17627600559 17627600559 95877.0 NaN 2019-08-26 GU201908261110787-1 GU201908261110787 1.0 10500.0 17627600559
len(df2)
10519
df2.id_y.isnull().sum()
4483
  • 把于本周第一次消费的数据提取出来,保存在df_sep_first
  • 本周合计销售金额:25653800
  • 本周新客销售金额:20929940
  • 本周新客销售占比 81.59%
df_week.amount.sum()
25653800.0
df_sep_first = df2[df2.id_y.isnull()]
len(df_sep_first)
#df_sep_first = df_sep_first[['id_x','week_x','date_x','order_d_x','order_e_x','amount_x','phone','account_x']]
#df_sep_first.head()
4483
df_sep_first.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 4483 entries, 0 to 10518
Data columns (total 17 columns):
id_x          4483 non-null float64
week_x        4483 non-null object
date_x        4483 non-null datetime64[ns]
order_d_x     4483 non-null object
order_e_x     4483 non-null object
quantity_x    4483 non-null float64
amount_x      4483 non-null float64
phone         4483 non-null object
account_x     3961 non-null object
id_y          0 non-null float64
week_y        0 non-null object
date_y        0 non-null datetime64[ns]
order_d_y     0 non-null object
order_e_y     0 non-null object
quantity_y    0 non-null float64
amount_y      0 non-null float64
account_y     0 non-null object
dtypes: datetime64[ns](2), float64(6), object(9)
memory usage: 630.4+ KB
df_sep_first.amount_x.sum()
20929940.0
  • 本周新客会员人数,即在df_sep_first中,account有多少个,3961
df_sep_first.account_x.count()
3961

第二部分:

  • 近一年累计消费金额≥¥68,000的总人数:202 #创建满足1的数据框df_oneyear2
  • 满足1条件的本周产生购买的新产生客人数:
  • 满足1条件的本周产生购买的原达标客人数:
  • 满足上述2个条件的人数消费总金额:
data.date.max()
Timestamp('2020-09-05 00:00:00')
df_oneyear1 = data[(data.date < '2019-09-05')]
(df_oneyear1.pivot_table(index='phone',values='amount',aggfunc='sum').amount > 68000).sum()
202
  • 下面把202条数据提取出来到df_oneyear2
df_oneyear2 = df_oneyear1.pivot_table(index='phone',values='amount',aggfunc='sum')
df_oneyear2 = df_oneyear2[df_oneyear2.amount > 68000]
len(df_oneyear2)
202

思路:

  • 难点在于,这里是聚合求和,满足近1年消费大于68000的聚合求和,已经存到df_oneyear2
  • 本周数据df_week,其中新客df_sep_first,减去之后即为本周消费的老客,这里我们先看本周之内全部满足1条件的人数
  • 先对df_week进行聚合求和,然后再和df_oneyear2相交,查到本周内全部累计消费》68000的用户
  • 这里易错点,拿df_week直接和df_oneyear2相交,结果不对,因为没有聚合求和
# df_week.date 结果是2020-08-30至2020-09-05,不需要考虑1年以内这个条件
df_week.pivot_table(index='phone',values='amount',aggfunc='sum').sort_values(by='amount',ascending=False).head()
amount
phone
12721265027 67600.0
12611996246 49000.0
15626452464 42500.0
16771162166 37000.0
16261501212 34500.0
  • 结果发现本周之内累计消费最高67600,都不满足68000,所以第2和第3都是0人

第三部分

  • 近一年有多次消费,且至少有一笔消费金额≥¥18,000
  • 近一年仅一次消费,且该笔消费金额≥¥18,000
  • 近一年累计消费金额<¥68,000

  • 满足上述条件的总人数:
  • 满足1条件的本周产生购买的新产生客人数:
  • 满足1条件的本周产生购买的原达标客人数:
  • 满足上述2个条件的人数消费总金额:

思路:第一题的总人数,先求出3个条件每一个满足条件的人数,再求和

df_oneyear1.head()
id week date order_d order_e quantity amount phone account
0 1.0 NaN 2017-06-08 GU20170608923396-1 GU20170608923396 1.0 2750.0 1500026566 aaa@qq.com
1 2.0 NaN 2017-06-08 GU20170608923428-1 GU20170608923428 1.0 2250.0 16201251750 aaa@qq.com
2 3.0 NaN 2017-06-08 GU20170608923430-1 GU20170608923430 1.0 3900.0 15010529091 aaa@qq.com
3 4.0 NaN 2017-06-08 GU20170608923431-1 GU20170608923431 1.0 3300.0 12621601625 aaa@qq.com
4 5.0 NaN 2017-06-08 GU20170608923435-1 GU20170608923435 1.0 2250.0 12917062526 aaa@qq.com
a = df_oneyear1[df_oneyear1.amount > 18000]
a.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1438 entries, 51 to 98105
Data columns (total 9 columns):
id          1438 non-null float64
week        0 non-null object
date        1438 non-null datetime64[ns]
order_d     1438 non-null object
order_e     1438 non-null object
quantity    1438 non-null float64
amount      1438 non-null float64
phone       1438 non-null object
account     1096 non-null object
dtypes: datetime64[ns](1), float64(3), object(5)
memory usage: 112.3+ KB
a.phone.value_counts() > 1
#遇到问题,不知道怎么计算大于1的数量,待查
16262121224     True
17621541994     True
12647119726     True
16129000006     True
16652119121     True
12064295506     True
12520221500     True
16964522200     True
16521512766     True
16612052679     True
16996176222     True
12500226762     True
15222227475     True
16201967491     True
12405291559     True
16570712222     True
15267120909     True
15210924674     True
16606609226     True
16552005664     True
16552401027     True
16752720064     True
15625676110     True
16916976602     True
15210951271     True
12676106654     True
16916055229     True
16264169746     True
12610649665     True
12662560267     True
               ...  
16466210740    False
12600679612    False
12001729520    False
12420052226    False
15907952196    False
15625229997    False
16921950066    False
12696912090    False
16979201079    False
12100221766    False
16201609226    False
16261646692    False
16611225626    False
15927546424    False
15912621004    False
17600446922    False
12260064657    False
16727962992    False
16962049492    False
19921426110    False
16217566946    False
12210562700    False
16296267629    False
15144694469    False
16166279955    False
12970221916    False
16477045216    False
15292277991    False
15207910610    False
15157020292    False
Name: phone, Length: 1329, dtype: bool