电商数据分析案例

程序员文章站 2024-03-07 17:55:27

...

题目：电商数据分析案例

数据集来源

本数据集来自kaggle，数据集链接如下：
https://www.kaggle.com/carrie1/ecommerce-data

分析目的：了解该电商平台的销售情况（该网站是一个以批发为主的电商平台）

分析思路框架：

1、产品维度
2、顾客维度
3、时间维度

查看数据集信息

import pandas as pd
import os
import matplotlib.pyplot as plt
os.chdir("F:\jupyter\E-commerce-data")

data = pd.read_csv("data.csv",encoding="ISO-8859-1")

data.head()

	InvoiceNo	StockCode	Description	Quantity	InvoiceDate	UnitPrice	CustomerID	Country
0	536365	85123A	WHITE HANGING HEART T-LIGHT HOLDER	6	12/1/2010 8:26	2.55	17850.0	United Kingdom
1	536365	71053	WHITE METAL *	6	12/1/2010 8:26	3.39	17850.0	United Kingdom
2	536365	84406B	CREAM CUPID HEARTS COAT HANGER	8	12/1/2010 8:26	2.75	17850.0	United Kingdom
3	536365	84029G	KNITTED UNION FLAG HOT WATER BOTTLE	6	12/1/2010 8:26	3.39	17850.0	United Kingdom
4	536365	84029E	RED WOOLLY HOTTIE WHITE HEART.	6	12/1/2010 8:26	3.39	17850.0	United Kingdom

数据集共有8个变量，分别为订单编号、产品编号、产品描述、购买数量、购买时间、产品单价、顾客ID、顾客所在地区

数据预处理

数据缺失值处理

data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 541909 entries, 0 to 541908
Data columns (total 8 columns):
 #   Column       Non-Null Count   Dtype  
---  ------       --------------   -----  
 0   InvoiceNo    541909 non-null  object 
 1   StockCode    541909 non-null  object 
 2   Description  540455 non-null  object 
 3   Quantity     541909 non-null  int64  
 4   InvoiceDate  541909 non-null  object 
 5   UnitPrice    541909 non-null  float64
 6   CustomerID   406829 non-null  float64
 7   Country      541909 non-null  object 
dtypes: float64(2), int64(1), object(5)
memory usage: 33.1+ MB

更改ID的数据类型

data["CustomerID"] = data["CustomerID"].astype("str")

产品描述和顾客ID有缺失变量，由于本次分析不涉及产品描述，故将其删掉。缺失的顾客ID用unknown表示

data = data.drop(["Description"],axis = 1)
data["CustomerID"] = data["CustomerID"].fillna("unknown")

处理重复值

data.duplicated().sum()

data = data.drop_duplicates()

处理异常值

data.describe()

	Quantity	UnitPrice
count	536639.000000	536639.000000
mean	9.619500	4.632660
std	219.130206	97.233299
min	-80995.000000	-11062.060000
25%	1.000000	1.250000
50%	3.000000	2.080000
75%	10.000000	4.130000
max	80995.000000	38970.000000

产品数量为负表示退货，产品单价为负没有意义，为异常值

data[data["UnitPrice"]<0].count()

InvoiceNo      0
StockCode      0
Quantity       0
InvoiceDate    0
UnitPrice      0
CustomerID     0
Country        0
dtype: int64

data = data[data["UnitPrice"]>=0]

data[data["UnitPrice"]==0].count()

InvoiceNo      2515
StockCode      2515
Quantity       2515
InvoiceDate    2515
UnitPrice      2515
CustomerID     2515
Country        2515
dtype: int64

单价为0的产品可能是赠品

变量处理

data["Total"]=data["Quantity"]*data["UnitPrice"]

data[["Date","Time"]]=data["InvoiceDate"].str.split(" ",expand=True)

data[["Month","Day","Year"]] = data["Date"].str.split("/",expand=True)

data["Hour"]=data["Time"].str.split(":",expand=True)[0].astype("int")

将数据集分为两部分，一部分是产品购买信息，一部分是退货信息

data_buy = data[data["Quantity"] >= 0]
data_return = data[data["Quantity"] < 0]

数据分析

PART ONE 产品维度

Question 1: 最热卖的产品是哪些？退货最多的产品是哪些？

product_quantity=data["Quantity"].groupby(data["StockCode"]).sum().sort_values(ascending=False)

product_quantity[:20]

StockCode
22197     56427
84077     53751
85099B    47260
85123A    38811
84879     36122
21212     36016
23084     30597
22492     26437
22616     26299
21977     24719
22178     23825
17003     23017
15036     22552
21915     22036
22386     20992
23203     19695
22086     18876
84946     18874
20725     18858
84991     18015
Name: Quantity, dtype: int64

product_quantity[-20:]

StockCode
D          -1194
23113      -1232
35610A     -1267
23115      -1288
23114      -1294
84598      -1313
23117      -1348
23116      -1364
22351      -1387
23270      -1616
22618      -1632
79323B     -1671
79323P     -2007
23059      -2376
72732      -2472
79323LP    -2618
79323W     -4838
72140F     -5368
23003      -8516
23005     -14468
Name: Quantity, dtype: int64

Question 2: 商品的价格分布如何？价格与销量有何关系？

data_unique_stock=data.drop_duplicates(["StockCode"])

data_unique_stock["UnitPrice"].describe()

count     4070.000000
mean         6.905278
std        173.775142
min          0.000000
25%          1.250000
50%          2.510000
75%          4.250000
max      11062.060000
Name: UnitPrice, dtype: float64

商品价格平均数是6.9，中位数是2.51，中位数>平均数，是一个右偏分布，也就是说该网站售卖的绝大部分都是低价商品，极少部分商品价格高昂，导致商品价格的标准差较大，总之该电商网站以批发低价商品为主

price_bin=pd.cut(data_unique_stock["UnitPrice"],bins=[0,1,2,3,4,5,6,7,10,15,20,25,30,100,10000,20000,30000,40000]).value_counts().sort_index()

price_per=price_bin/price_bin.sum()

price_cumper=price_bin.cumsum()/price_bin.sum()

pd.concat([price_per,price_cumper],axis=1)

	UnitPrice	UnitPrice
(0, 1]	0.181323	0.181323
(1, 2]	0.231907	0.413230
(2, 3]	0.190143	0.603372
(3, 4]	0.078859	0.682231
(4, 5]	0.089235	0.771466
(5, 6]	0.050324	0.821790
(6, 7]	0.023606	0.845396
(7, 10]	0.087419	0.932815
(10, 15]	0.037095	0.969909
(15, 20]	0.012970	0.982879
(20, 25]	0.004150	0.987030
(25, 30]	0.004929	0.991958
(30, 100]	0.005966	0.997925
(100, 10000]	0.001816	0.999741
(10000, 20000]	0.000259	1.000000
(20000, 30000]	0.000000	1.000000
(30000, 40000]	0.000000	1.000000

具体来看，商品价格在（1，2]的价格区间里是最多的,占23.19%；商品价格在10元以内的占了93.28%，在100元以内的占了99.79%。

cut=pd.cut(data["UnitPrice"],bins=[0,1,2,3,4,5,6,7,10,15,20,25,30,100,10000,20000,30000,40000])

quantity_price=data["Quantity"].groupby(cut).sum()

quantity_price.cumsum()/quantity_price.sum()

UnitPrice
(0, 1]            0.391199
(1, 2]            0.716792
(2, 3]            0.855267
(3, 4]            0.900399
(4, 5]            0.950368
(5, 6]            0.962797
(6, 7]            0.970164
(7, 10]           0.988817
(10, 15]          0.997159
(15, 20]          0.998879
(20, 25]          0.999407
(25, 30]          0.999606
(30, 100]         0.999869
(100, 10000]      1.000001
(10000, 20000]    1.000000
(20000, 30000]    1.000000
(30000, 40000]    1.000000
Name: Quantity, dtype: float64

(quantity_price.cumsum()/quantity_price.sum()).plot()

<matplotlib.axes._subplots.AxesSubplot at 0x26a786bd278>

电商数据分析案例

71.68%的商品销量集中在价格在2元以内的商品；99.7%的商品销量集中的价格在15元以内的商品

Question 3:退货商品的价格分布与退货量情况是怎样的？

data_return_unique_stock=data_return.drop_duplicates("StockCode")

return_cut=pd.cut(data_return_unique_stock["UnitPrice"],bins=[0,1,2,3,4,5,10,15,20,50,100,200,500,1000]).value_counts().sort_index()

pd.concat([return_cut/return_cut.sum(),return_cut.cumsum()/return_cut.sum()],axis=1)

	UnitPrice	UnitPrice
(0, 1]	0.227027	0.227027
(1, 2]	0.283243	0.510270
(2, 3]	0.172432	0.682703
(3, 4]	0.070270	0.752973
(4, 5]	0.079459	0.832432
(5, 10]	0.122162	0.954595
(10, 15]	0.027027	0.981622
(15, 20]	0.006486	0.988108
(20, 50]	0.007027	0.995135
(50, 100]	0.002703	0.997838
(100, 200]	0.001081	0.998919
(200, 500]	0.000541	0.999459
(500, 1000]	0.000541	1.000000

退货商品最多的价格区间是（1，2]元，退货商品价格在10元以内的占了95.46%。

(return_cut/return_cut.sum()).plot()
(price_bin/price_bin.sum()).plot()

<matplotlib.axes._subplots.AxesSubplot at 0x26a007ae898>

电商数据分析案例

红色是退货商品的价格分布，蓝色是全部商品的价格分布。从退货商品种类的角度，退货商品与总体商品相比，更集中在0-3元与5-10元这两个区间

return_quantity_price=data_return["Quantity"].groupby(pd.cut(data_return["UnitPrice"],bins=[0,1,2,3,4,5,10,15,20,50,100,200,500,1000])).sum()

return_quantity_price/return_quantity_price.sum()

UnitPrice
(0, 1]         0.161663
(1, 2]         0.405996
(2, 3]         0.371273
(3, 4]         0.014244
(4, 5]         0.016029
(5, 10]        0.022745
(10, 15]       0.005848
(15, 20]       0.000672
(20, 50]       0.000715
(50, 100]      0.000232
(100, 200]     0.000203
(200, 500]     0.000243
(500, 1000]    0.000138
Name: Quantity, dtype: float64

(return_quantity_price/return_quantity_price.sum()).plot()
(quantity_price/quantity_price.sum()).plot()

<matplotlib.axes._subplots.AxesSubplot at 0x26a03035cf8>

电商数据分析案例

1-10元的商品退货量会偏高一些

PART TWO 顾客维度

Question 4 :哪些顾客购买金额最多？哪些顾客购买次数最频繁？如何根据这两个变量对顾客分类？

customer_total=data_buy["Total"].groupby(data_buy["CustomerID"]).sum().sort_values(ascending=False)
customer_total[:20]

CustomerID
nan        1.754902e+06
14646.0    2.802060e+05
18102.0    2.596573e+05
17450.0    1.943908e+05
16446.0    1.684725e+05
14911.0    1.437112e+05
12415.0    1.249145e+05
14156.0    1.172101e+05
17511.0    9.106238e+04
16029.0    8.085084e+04
12346.0    7.718360e+04
16684.0    6.665356e+04
14096.0    6.516479e+04
13694.0    6.503962e+04
15311.0    6.063275e+04
13089.0    5.876208e+04
17949.0    5.851048e+04
15769.0    5.625272e+04
15061.0    5.453414e+04
14298.0    5.152730e+04
Name: Total, dtype: float64

customer_buy_fre=data_buy.drop_duplicates(["InvoiceNo"])["InvoiceNo"].groupby(data_buy["CustomerID"]).count().sort_values(ascending=False)
customer_buy_fre[:20]

CustomerID
nan        2190
12748.0     210
14911.0     201
17841.0     124
13089.0      97
14606.0      93
15311.0      91
12971.0      86
14646.0      74
16029.0      63
13408.0      62
18102.0      60
13798.0      57
14527.0      55
14156.0      55
16422.0      51
13694.0      50
15061.0      48
15039.0      47
16013.0      47
Name: InvoiceNo, dtype: int64

customer_total.drop("nan",inplace=True)
customer_buy_fre.drop("nan",inplace=True)

plt.style.use("ggplot")

plt.scatter(customer_buy_fre.sort_index(),customer_total.sort_index())
plt.axhline(y=50000,c="green")
plt.axvline(x=50,c="blue")

<matplotlib.lines.Line2D at 0x26a6c5050b8>

电商数据分析案例

根据顾客购买次数和购买金额，对顾客进行分组，采取不同的营销措施。

Question 5:订单的商品种类分布是怎样的？

order_type=data_buy["StockCode"].groupby(data_buy["InvoiceNo"]).count().sort_values(ascending=False)

pd.cut(order_type,bins=[0,50,100,150,200,250,300,500,1000,1200]).value_counts()

(0, 50]         18552
(50, 100]        1522
(100, 150]        250
(150, 200]        165
(200, 250]         79
(300, 500]         66
(500, 1000]        48
(250, 300]         43
(1000, 1200]        1
Name: StockCode, dtype: int64

未退货的顾客中，大部分的订单商品种类在100以内。

order_type_r=data_return["StockCode"].groupby(data_return["InvoiceNo"]).count().sort_values(ascending=False)

pd.cut(order_type_r,bins=[0,10,20,30,40,50,60,150]).value_counts()

(0, 10]      5068
(10, 20]       73
(20, 30]       19
(30, 40]        9
(60, 150]       1
(50, 60]        1
(40, 50]        1
Name: StockCode, dtype: int64

退货顾客的商品种类大部分都在10以内

Question 6:哪些国家的顾客消费占比最大？哪些国家的顾客数量最多？哪些国家的顾客平均消费最高？

data

	InvoiceNo	StockCode	Quantity	InvoiceDate	UnitPrice	CustomerID	Country	Total	Date	Time	Month	Day	Year
0	536365	85123A	6	12/1/2010 8:26	2.55	17850.0	United Kingdom	15.30	12/1/2010	8:26	12	1	2010
1	536365	71053	6	12/1/2010 8:26	3.39	17850.0	United Kingdom	20.34	12/1/2010	8:26	12	1	2010
2	536365	84406B	8	12/1/2010 8:26	2.75	17850.0	United Kingdom	22.00	12/1/2010	8:26	12	1	2010
3	536365	84029G	6	12/1/2010 8:26	3.39	17850.0	United Kingdom	20.34	12/1/2010	8:26	12	1	2010
4	536365	84029E	6	12/1/2010 8:26	3.39	17850.0	United Kingdom	20.34	12/1/2010	8:26	12	1	2010
...	...	...	...	...	...	...	...	...	...	...	...	...	...
541904	581587	22613	12	12/9/2011 12:50	0.85	12680.0	France	10.20	12/9/2011	12:50	12	9	2011
541905	581587	22899	6	12/9/2011 12:50	2.10	12680.0	France	12.60	12/9/2011	12:50	12	9	2011
541906	581587	23254	4	12/9/2011 12:50	4.15	12680.0	France	16.60	12/9/2011	12:50	12	9	2011
541907	581587	23255	4	12/9/2011 12:50	4.15	12680.0	France	16.60	12/9/2011	12:50	12	9	2011
541908	581587	22138	3	12/9/2011 12:50	4.95	12680.0	France	14.85	12/9/2011	12:50	12	9	2011

536637 rows × 13 columns

country_total = data["Total"].groupby(data["Country"]).sum().sort_values(ascending=False)

country_customer=data["CustomerID"].groupby(data["Country"]).count().sort_values(ascending=False)

country_total_percent=(country_total/country_total.sum())
country_total_percent

Country
United Kingdom          0.840075
Netherlands             0.029203
EIRE                    0.026980
Germany                 0.022725
France                  0.020243
Australia               0.014056
Switzerland             0.005782
Spain                   0.005617
Belgium                 0.004197
Sweden                  0.003753
Japan                   0.003626
Norway                  0.003607
Portugal                0.003006
Finland                 0.002290
Channel Islands         0.002060
Denmark                 0.001925
Italy                   0.001733
Cyprus                  0.001319
Austria                 0.001042
*               0.001016
Singapore               0.000936
Israel                  0.000811
Poland                  0.000740
Unspecified             0.000486
Greece                  0.000483
Iceland                 0.000442
Canada                  0.000376
Malta                   0.000257
United Arab Emirates    0.000195
USA                     0.000178
Lebanon                 0.000174
Lithuania               0.000170
European Community      0.000133
Brazil                  0.000117
RSA                     0.000103
Czech Republic          0.000073
Bahrain                 0.000056
Saudi Arabia            0.000013
Name: Total, dtype: float64

country_customer_percent=country_customer/country_customer.sum()
country_customer_percent

Country
United Kingdom          0.913646
Germany                 0.017666
France                  0.015916
EIRE                    0.015251
Spain                   0.004711
Netherlands             0.004418
Belgium                 0.003855
Switzerland             0.003716
Portugal                0.002814
Australia               0.002344
Norway                  0.002024
Italy                   0.001496
Channel Islands         0.001411
Finland                 0.001295
Cyprus                  0.001139
Sweden                  0.000859
Unspecified             0.000824
Austria                 0.000747
Denmark                 0.000725
Japan                   0.000667
Poland                  0.000635
Israel                  0.000548
USA                     0.000542
*               0.000529
Singapore               0.000427
Iceland                 0.000339
Canada                  0.000281
Greece                  0.000272
Malta                   0.000237
United Arab Emirates    0.000127
European Community      0.000114
RSA                     0.000108
Lebanon                 0.000084
Lithuania               0.000065
Brazil                  0.000060
Czech Republic          0.000056
Bahrain                 0.000035
Saudi Arabia            0.000019
Name: CustomerID, dtype: float64

country_total_percent.drop("United Kingdom",inplace=True)
country_customer_percent.drop("United Kingdom",inplace=True)

plt.scatter(country_total_percent.sort_index(),country_customer_percent.sort_index())

<matplotlib.collections.PathCollection at 0x26a7e033c18>

电商数据分析案例

data["Total"].groupby(data["Country"]).mean().sort_values(ascending=False)

Country
Netherlands             120.059696
Australia               108.910787
Japan                    98.716816
Sweden                   79.360976
Denmark                  48.247147
Lithuania                47.458857
Singapore                39.827031
Lebanon                  37.641778
Brazil                   35.737500
*                34.888169
Norway                   32.378877
Greece                   32.263836
EIRE                     32.135066
Finland                  32.124806
Bahrain                  28.863158
Switzerland              28.266324
United Arab Emirates     27.974706
Israel                   26.877449
Channel Islands          26.520991
Austria                  25.322494
Canada                   24.280662
Iceland                  23.681319
Czech Republic           23.590667
Germany                  23.365978
France                   23.102343
Spain                    21.659822
European Community       21.176230
Poland                   21.152903
Cyprus                   21.045434
Italy                    21.034259
Belgium                  19.773301
Malta                    19.728110
Portugal                 19.405940
RSA                      17.281207
United Kingdom           16.701544
Saudi Arabia             13.117000
Unspecified              10.726109
USA                       5.948179
Name: Total, dtype: float64

这家电商网站有91.36%的顾客来自本国UK，84%的销售额来自本国。销售额占比超过1%的国家有荷兰、爱尔兰、法国、澳大利亚，顾客数占比超过1%的国家有德国、法国、爱尔兰.平均顾客消费最高的国家是爱尔兰、澳大利亚、日本、瑞典.这些国家可以作为开拓海外市场的重点对象。

PART THREE 时间维度

Question 7: 不同月份的销售金额情况？订单情况？每订单平均金额情况？

data["Month"]=data["Month"].astype("int")

data["Total"].groupby(data["Month"]).sum().sort_index().plot()

<matplotlib.axes._subplots.AxesSubplot at 0x26a01d48eb8>

电商数据分析案例

data["InvoiceNo"].groupby(data["Month"]).count().sort_index().plot()

<matplotlib.axes._subplots.AxesSubplot at 0x26a01a31cf8>

电商数据分析案例

data["Total"].groupby(data["Month"]).mean().sort_index().plot()

<matplotlib.axes._subplots.AxesSubplot at 0x26a01c265f8>

电商数据分析案例

从1月到12月份的销售总额呈波动上升状态，并且在8月-11月一直以较大的幅度增长，到11月份达到顶峰。订单数量也与销售总金额呈现一样的分布。一年中，平均订单金额波动较大，分别在3月、5月、9月达到了极大值，并且9月份是全年平均订单金额最大的月份。

Question 8: 不同时间段的销售金额情况？订单情况？每订单平均金额情况？

data["Total"].groupby(data["Hour"]).sum().sort_index().plot()

<matplotlib.axes._subplots.AxesSubplot at 0x257298bb9e8>

电商数据分析案例

data["InvoiceNo"].groupby(data["Hour"]).count().sort_index().plot()

<matplotlib.axes._subplots.AxesSubplot at 0x257386b6208>

电商数据分析案例

data["Total"].groupby(data["Hour"]).mean().sort_index().plot()

<matplotlib.axes._subplots.AxesSubplot at 0x25738195a58>

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-MgORUVVE-1589979700246)(output_98_1.png)]

总销售金额比较高的订单时间主要集中在9-17点，为上班时间，在10点、12点、15点达到极大值。订单数在时间上的分布与销售金额的分布大致一样。平均订单金额最高的时间是7点，其他时间点相差不大。

分析结论

关于产品维度，了解到的情况是：

①销量top20的产品是哪些，退货最多的top20产品是哪些

②产品价格分布表明该网站以批发低价产品为主，93%的产品单价在10元以内.产品单价-销量图表明72%的产品销量集中在2元以内的产品，99%的产品销量集中在15元以内的产品。

③退货产品最多的价格区间是1-2元，与总体相比，退货产品的价格更集中在0-3元和5-10元，退货量最多的产品更集中在10元以内

关于顾客维度，了解到的情况是：

④购买金额最多和购买次数最频繁的顾客是哪些，然后依据购买总金额和购买总次数对顾客进行分类，进行精准营销

⑤全部订单中，大部分成功交易的产品种类在100以内，大部分退货订单的产品种类在10以内

⑥该电商网站91%的顾客来自国内，84%的销售额来自国内，说明相对来说国外的顾客价值更高。除了本国，销售额比较高的国家有荷兰、爱尔兰、法国、澳大利亚，顾客数比较多的国家有德国、法国、爱尔兰，平均顾客消费比较高的国家有爱尔兰、澳大利亚、日本、瑞典。

关于时间维度，了解到的情况是：

⑦在一年中，订单金额和订单数量呈波动上升，11月份达到最大值，这主要是因为12月份、1月份有圣诞节、新年，是消费者购物的狂欢期，而该电商网站的顾客是批发商，必须提前进货。平均订单金额在9月份达到最大值。

⑧由于该网站的顾客并非个人，因而订单下单主要在工作时间。10、12、15点是下单高峰期，应安排足够的人员和其他资源。

上一篇： Mac下安装配置Docker

下一篇： Kaggle （Bike Sharing Demand）top20%

电商数据分析案例

题目：电商数据分析案例

数据集来源

分析目的：了解该电商平台的销售情况（该网站是一个以批发为主的电商平台）

分析思路框架：

查看数据集信息

数据集共有8个变量，分别为订单编号、产品编号、产品描述、购买数量、购买时间、产品单价、顾客ID、顾客所在地区

数据预处理

数据缺失值处理

更改ID的数据类型

产品描述和顾客ID有缺失变量，由于本次分析不涉及产品描述，故将其删掉。缺失的顾客ID用unknown表示

处理重复值

处理异常值

产品数量为负表示退货，产品单价为负没有意义，为异常值

单价为0的产品可能是赠品

变量处理

将数据集分为两部分，一部分是产品购买信息，一部分是退货信息

数据分析

PART ONE 产品维度

Question 1: 最热卖的产品是哪些？退货最多的产品是哪些？

Question 2: 商品的价格分布如何？价格与销量有何关系？

商品价格平均数是6.9，中位数是2.51，中位数>平均数，是一个右偏分布，也就是说该网站售卖的绝大部分都是低价商品，极少部分商品价格高昂，导致商品价格的标准差较大，总之该电商网站以批发低价商品为主

具体来看，商品价格在（1，2]的价格区间里是最多的,占23.19%；商品价格在10元以内的占了93.28%，在100元以内的占了99.79%。

71.68%的商品销量集中在价格在2元以内的商品；99.7%的商品销量集中的价格在15元以内的商品

Question 3:退货商品的价格分布与退货量情况是怎样的？

退货商品最多的价格区间是（1，2]元，退货商品价格在10元以内的占了95.46%。

红色是退货商品的价格分布，蓝色是全部商品的价格分布。从退货商品种类的角度，退货商品与总体商品相比，更集中在0-3元与5-10元这两个区间

1-10元的商品退货量会偏高一些

PART TWO 顾客维度

Question 4 :哪些顾客购买金额最多？哪些顾客购买次数最频繁？如何根据这两个变量对顾客分类？

根据顾客购买次数和购买金额，对顾客进行分组，采取不同的营销措施。

Question 5:订单的商品种类分布是怎样的？

未退货的顾客中，大部分的订单商品种类在100以内。

退货顾客的商品种类大部分都在10以内

Question 6:哪些国家的顾客消费占比最大？哪些国家的顾客数量最多？哪些国家的顾客平均消费最高？

PART THREE 时间维度

Question 7: 不同月份的销售金额情况？订单情况？每订单平均金额情况？

Question 8: 不同时间段的销售金额情况？订单情况？每订单平均金额情况？

总销售金额比较高的订单时间主要集中在9-17点，为上班时间，在10点、12点、15点达到极大值。订单数在时间上的分布与销售金额的分布大致一样。平均订单金额最高的时间是7点，其他时间点相差不大。

分析结论

关于产品维度，了解到的情况是：

①销量top20的产品是哪些，退货最多的top20产品是哪些

②产品价格分布表明该网站以批发低价产品为主，93%的产品单价在10元以内.产品单价-销量图表明72%的产品销量集中在2元以内的产品，99%的产品销量集中在15元以内的产品。

③退货产品最多的价格区间是1-2元，与总体相比，退货产品的价格更集中在0-3元和5-10元，退货量最多的产品更集中在10元以内

关于顾客维度，了解到的情况是：

④购买金额最多和购买次数最频繁的顾客是哪些，然后依据购买总金额和购买总次数对顾客进行分类，进行精准营销

⑤全部订单中，大部分成功交易的产品种类在100以内，大部分退货订单的产品种类在10以内

关于时间维度，了解到的情况是：

⑦在一年中，订单金额和订单数量呈波动上升，11月份达到最大值，这主要是因为12月份、1月份有圣诞节、新年，是消费者购物的狂欢期，而该电商网站的顾客是批发商，必须提前进货。平均订单金额在9月份达到最大值。

⑧由于该网站的顾客并非个人，因而订单下单主要在工作时间。10、12、15点是下单高峰期，应安排足够的人员和其他资源。

电商数据分析案例

Python数据分析 CDnow网站的用户购买行为分析案例(RFM分析模型)

网红营销案例：吃播女王密子君、电商钱夫人雪梨

电影评分数据分析案例-[Java]

电商平台运营策划方案实战案例

Python数据分析案例实战

Python数据分析入门案例

GA 电商数据分析实践课

案例实战：每日上亿请求量的电商系统，JVM年轻代垃圾回收参数如何优化？

双十一，电商直播怎么玩？映客四个案例让你秒懂