数据分析Seaborn常用画图方式汇总|多种方法画图找趋势、找关系、找分布|20 mins速成|Kaggle 学习（一）

程序员文章站 2022-06-15 23:47:06

...

Trends

- A trend is defined as a pattern of change.
- sns.lineplot - Line charts are best to show trends over a period of time, and multiple lines can be used to show trends in more than one group.
Relationship

- There are many different chart types that you can use to understand relationships between variables in your data.
- sns.barplot - Bar charts are useful for comparing quantities corresponding to different groups.
- sns.heatmap - Heatmaps can be used to find color-coded patterns in tables of numbers.
- sns.scatterplot - Scatter plots show the relationship between two continuous variables; if color-coded, we can also show the relationship with a third categorical variable.
- sns.regplot - Including a regression line in the scatter plot makes it easier to see any linear relationship between two variables.
- sns.lmplot - This command is useful for drawing multiple regression lines, if the scatter plot contains multiple, color-coded groups.
- sns.swarmplot - Categorical scatter plots show the relationship between a continuous variable and a categorical variable.
Distribution

- We visualize distributions to show the possible values that we can expect to see in a variable, along with how likely they are.
- sns.distplot - Histograms show the distribution of a single numerical variable.
- sns.kdeplot - KDE plots (or 2D KDE plots) show an estimated, smooth distribution of a single numerical variable (or two numerical variables).
- sns.jointplot - This command is useful for simultaneously displaying a 2D KDE plot with the corresponding KDE plots for each individual variable.

1. Line Chart

import pandas as pd
pd.plotting.register_matplotlib_converters()
import matplotlib,pyplot as plt
%matplotlib inline
import seaborn as sns

# Path of the file to read
spotify_filepath = "../input/spotify.csv"

# Read the file into a variable spotify_data
spotify_data = pd.read_csv(spotify_filepath, index_col="Date", parse_dates=True)

spotify_data.tail()

	Shape of You	Despacito	Something Just Like This	HUMBLE.	Unforgettable
Date
2018-01-05	4492978	3450315.0	2408365.0	2685857.0	2869783.0
2018-01-06	4416476	3394284.0	2188035.0	2559044.0	2743748.0
2018-01-07	4009104	3020789.0	1908129.0	2350985.0	2441045.0
2018-01-08	4135505	2755266.0	2023251.0	2523265.0	2622693.0
2018-01-09	4168506	2791601.0	2058016.0	2727678.0	2627334.0

# Line chart showing daily global streams of each song
sns.lineplot(data=spotify_data)

<matplotlib.axes._subplots.AxesSubplot at 0x7fc8b2bb6f98>

数据分析Seaborn常用画图方式汇总|多种方法画图找趋势、找关系、找分布|20 mins速成|Kaggle 学习（一）

# Set the width and height of the figure
# sets the size of the figure to 14 inches (in width) by 6 inches (in height)
plt.figure(figsize=(14, 6))

# Add title
plt.title("Daily Global Streams of Popular Songs in 2017-2018")

# Line chart showing daily global streams of each song
sns.lineplot(data=spotify_data)

<matplotlib.axes._subplots.AxesSubplot at 0x7fc8b2a74780>

数据分析Seaborn常用画图方式汇总|多种方法画图找趋势、找关系、找分布|20 mins速成|Kaggle 学习（一）

Changing styles

# Seaborn has five different themes:(1)"darkgrid", (2)"whitegrid", (3)"dark", (4)"white", and (5)"ticks"
# Change the style of the figure to the "dark" theme
sns.set_style("dark")

# Line chart 
plt.figure(figsize=(12,6))
sns.lineplot(data=spotify_data)

<matplotlib.axes._subplots.AxesSubplot at 0x7f5faa4bc828>

数据分析Seaborn常用画图方式汇总|多种方法画图找趋势、找关系、找分布|20 mins速成|Kaggle 学习（一）

Plot a subset of the data

# Set the width and height of the figure
plt.figure(figsize=(14,6))

# Add title
plt.title("Daily Global Streams of Popular Songs in 2017-2018")

# Line chart showing daily global streams of 'Shape of You'
sns.lineplot(data=spotify_data['Shape of You'], label="Shape of You")

# Line chart showing daily global streams of 'Despacito'
sns.lineplot(data=spotify_data['Despacito'], label="Despacito")

# Add label for horizontal axis
plt.xlabel("Date")

数据分析Seaborn常用画图方式汇总|多种方法画图找趋势、找关系、找分布|20 mins速成|Kaggle 学习（一）

2.Bar Charts

# Print the data
flight_data

	AA	AS	B6	DL	EV	F9	HA	MQ	NK	OO	UA	US	VX	WN
Month
1	6.955843	-0.320888	7.347281	-2.043847	8.537497	18.357238	3.512640	18.164974	11.398054	10.889894	6.352729	3.107457	1.420702	3.389466
2	7.530204	-0.782923	18.657673	5.614745	10.417236	27.424179	6.029967	21.301627	16.474466	9.588895	7.260662	7.114455	7.784410	3.501363
3	6.693587	-0.544731	10.741317	2.077965	6.730101	20.074855	3.468383	11.018418	10.039118	3.181693	4.892212	3.330787	5.348207	3.263341
4	4.931778	-3.009003	2.780105	0.083343	4.821253	12.640440	0.011022	5.131228	8.766224	3.223796	4.376092	2.660290	0.995507	2.996399
5	5.173878	-1.716398	-0.709019	0.149333	7.724290	13.007554	0.826426	5.466790	22.397347	4.141162	6.827695	0.681605	7.102021	5.680777
6	8.191017	-0.220621	5.047155	4.419594	13.952793	19.712951	0.882786	9.639323	35.561501	8.338477	16.932663	5.766296	5.779415	10.743462
7	3.870440	0.377408	5.841454	1.204862	6.926421	14.464543	2.001586	3.980289	14.352382	6.790333	10.262551	NaN	7.135773	10.504942
8	3.193907	2.503899	9.280950	0.653114	5.154422	9.175737	7.448029	1.896565	20.519018	5.606689	5.014041	NaN	5.106221	5.532108
9	-1.432732	-1.813800	3.539154	-3.703377	0.851062	0.978460	3.696915	-2.167268	8.000101	1.530896	-1.794265	NaN	0.070998	-1.336260
10	-0.580930	-2.993617	3.676787	-5.011516	2.303760	0.082127	0.467074	-3.735054	6.810736	1.750897	-2.456542	NaN	2.254278	-0.688851
11	0.772630	-1.916516	1.418299	-3.175414	4.415930	11.164527	-2.719894	0.220061	7.543881	4.925548	0.281064	NaN	0.116370	0.995684
12	4.149684	-1.846681	13.839290	2.504595	6.685176	9.346221	-1.706475	0.662486	12.733123	10.947612	7.012079	NaN	13.498720	6.720893

# Set the width and height of the figure
plt.figure(figsize=(10,6))

# Add title
plt.title("Average Arrival Delay for Spirit Airlines Flights, by Month")

# Bar chart showing average arrival delay for Spirit Airlines flights by month
sns.barplot(x=flight_data.index, y=flight_data['NK'])

# Add label for vertical axis
plt.ylabel("Arrival delay (in minutes)")

Text(0, 0.5, 'Arrival delay (in minutes)')

数据分析Seaborn常用画图方式汇总|多种方法画图找趋势、找关系、找分布|20 mins速成|Kaggle 学习（一）

3.Heatmap

# Set the width and height of the figure
plt.figure(figsize=(14,7))

# Add title
plt.title("Average Arrival Delay for Each Airline, by Month")

# Heatmap showing average arrival delay for each airline by month
sns.heatmap(data=flight_data, annot=True)

# Add label for horizontal axis
plt.xlabel("Airline")

Text(0.5, 42.0, 'Airline')

数据分析Seaborn常用画图方式汇总|多种方法画图找趋势、找关系、找分布|20 mins速成|Kaggle 学习（一）

4.Scatter Plots

insurance_data.head()

	age	sex	bmi	children	smoker	region	charges
0	19	female	27.900	0	yes	southwest	16884.92400
1	18	male	33.770	1	no	southeast	1725.55230
2	28	male	33.000	3	no	southeast	4449.46200
3	33	male	22.705	0	no	northwest	21984.47061
4	32	male	28.880	0	no	northwest	3866.85520

sns.scatterplot(x=insurance_data['bmi'], y=insurance_data['charges'])

<matplotlib.axes._subplots.AxesSubplot at 0x7f44f2300048>

数据分析Seaborn常用画图方式汇总|多种方法画图找趋势、找关系、找分布|20 mins速成|Kaggle 学习（一）

# Add a regression line
sns.regplot(x=insurance_data['bmi'], y=insurance_data['charges'])

<matplotlib.axes._subplots.AxesSubplot at 0x7f44f222c588>

数据分析Seaborn常用画图方式汇总|多种方法画图找趋势、找关系、找分布|20 mins速成|Kaggle 学习（一）

Color-coded scatter plots

# color-code the points by 'smoker', plot the other two columns('bmi', 'charges') on the axes 
sns.scatterplot(x=insurance_data['bmi'], y=insurance_data['charges'], hue=insurance_data['smoker'])

<matplotlib.axes._subplots.AxesSubplot at 0x7f44f19b49e8>

数据分析Seaborn常用画图方式汇总|多种方法画图找趋势、找关系、找分布|20 mins速成|Kaggle 学习（一）

# add two regression lines, corresponding to smokers and nonsmokers
# Instead of setting x=insurance_data['bmi'] to select the 'bmi' column in insurance_data, we set x="bmi" to specify the name of the column only.
# Similarly, y="charges" and hue="smoker" also contain the names of columns.
# We specify the dataset with data=insurance_data.

sns.lmplot(x="bmi", y="charges", hue="smoker", data=insurance_data)

<seaborn.axisgrid.FacetGrid at 0x7f44f192d668>

数据分析Seaborn常用画图方式汇总|多种方法画图找趋势、找关系、找分布|20 mins速成|Kaggle 学习（一）

sns.swarmplot(x=insurance_data['smoker'],
							y=insurance_data['charges'])

数据分析Seaborn常用画图方式汇总|多种方法画图找趋势、找关系、找分布|20 mins速成|Kaggle 学习（一）

5.Histograms

	Sepal Length (cm)	Sepal Width (cm)	Petal Length (cm)	Petal Width (cm)	Species
Id
1	5.1	3.5	1.4	0.2	Iris-setosa
2	4.9	3.0	1.4	0.2	Iris-setosa
3	4.7	3.2	1.3	0.2	Iris-setosa
4	4.6	3.1	1.5	0.2	Iris-setosa
5	5.0	3.6	1.4	0.2	Iris-setosa

# 'a' chooses the columns of the data
# kde=False is something we'll always provide when creating a histogram, as leaving it out will create a slightly different plot.
sns.displot(a=iris_data['Petal Length(cm)'], kde=False)

<matplotlib.axes._subplots.AxesSubplot at 0x7f96c5b1da20>

数据分析Seaborn常用画图方式汇总|多种方法画图找趋势、找关系、找分布|20 mins速成|Kaggle 学习（一）

Color-coded plots

# Histograms for each species
sns.distplot(a=iris_set_data['Petal Length (cm)'], label="Iris-setosa", kde=False)
sns.distplot(a=iris_ver_data['Petal Length (cm)'], label="Iris-versicolor", kde=False)
sns.distplot(a=iris_vir_data['Petal Length (cm)'], label="Iris-virginica", kde=False)

# Add title
plt.title("Histogram of Petal Lengths, by Species")

# Force legend to appear
plt.legend()

<matplotlib.legend.Legend at 0x7f96c5849470>

数据分析Seaborn常用画图方式汇总|多种方法画图找趋势、找关系、找分布|20 mins速成|Kaggle 学习（一）

6. Density plots

# Kernel density estimate(KDE) plot is like as a smoothed histogram
# 'shade=True' colors the area below the curve
sns.kdeplot(data=iris_data['Petal Length (cm)'], shade=True)

<matplotlib.axes._subplots.AxesSubplot at 0x7f96c5a664e0>

数据分析Seaborn常用画图方式汇总|多种方法画图找趋势、找关系、找分布|20 mins速成|Kaggle 学习（一）

# 2D KDE plot
sns.jointplot(x=iris_data['Petal Length (cm)'], y=iris_data['Sepal Width (cm)'], kind="kde")

<seaborn.axisgrid.JointGrid at 0x7f96c59cbef0>

The color-coding shows us how likely we are to see different combinations of sepal width and petal length, where darker parts of the figure are more likely.
数据分析Seaborn常用画图方式汇总|多种方法画图找趋势、找关系、找分布|20 mins速成|Kaggle 学习（一）

the curve at the top of the figure is a KDE plot for the data on the x-axis (in this case, iris_data['Petal Length (cm)']), and
the curve on the right of the figure is a KDE plot for the data on the y-axis (in this case, iris_data['Sepal Width (cm)']).

Color-coded plots

# KDE plots for each species
sns.kdeplot(data=iris_set_data['Petal Length (cm)'], label="Iris-setosa", shade=True)
sns.kdeplot(data=iris_ver_data['Petal Length (cm)'], label="Iris-versicolor", shade=True)
sns.kdeplot(data=iris_vir_data['Petal Length (cm)'], label="Iris-virginica", shade=True)

# Add title
plt.title("Distribution of Petal Lengths, by Species")

Text(0.5, 1.0, 'Distribution of Petal Lengths, by Species')

数据分析Seaborn常用画图方式汇总|多种方法画图找趋势、找关系、找分布|20 mins速成|Kaggle 学习（一）

数据分析Seaborn常用画图方式汇总|多种方法画图找趋势、找关系、找分布|20 mins速成|Kaggle 学习（一）

1. Line Chart

Changing styles

Plot a subset of the data

2.Bar Charts

3.Heatmap

4.Scatter Plots

Color-coded scatter plots

5.Histograms

Color-coded plots

6. Density plots

Color-coded plots

delphi7 rename出错解决方法

他每天下班后学习2小时，拿到了滴滴/头条/京东等大厂Java岗offer

Win10新预览版21322推送：此电脑一级目录终于移除3D对象文件夹

一个手写的vue放大镜效果

Ajax 通过城市名获取数据(全国天气预报API)

用一条SQL语句取出第 m 条到第 n 条记录的方法

支付宝app怎么代开发票? 支付宝代开发票的方法

ajax处理服务器返回的三种数据类型方法

ReactRouter的实现方法

斗鱼怎么购买鱼翅? 斗鱼鱼翅充值的方法

数据分析Seaborn常用画图方式汇总|多种方法画图找趋势、找关系、找分布|20 mins速成|Kaggle 学习（一）

1. Line Chart

Changing styles

Plot a subset of the data

2.Bar Charts

3.Heatmap

4.Scatter Plots

Color-coded scatter plots

5.Histograms

Color-coded plots

6. Density plots

Color-coded plots

delphi7 rename出错 解决方法

他每天下班后学习2小时，拿到了滴滴/头条/京东等大厂Java岗offer

Win10新预览版21322推送：此电脑一级目录终于移除3D对象文件夹

一个手写的vue放大镜效果

Ajax 通过城市名获取数据(全国天气预报API)

用一条SQL语句取出第 m 条到第 n 条记录的方法

支付宝app怎么代开发票? 支付宝代开发票的方法

ajax处理服务器返回的三种数据类型方法

ReactRouter的实现方法

斗鱼怎么购买鱼翅? 斗鱼鱼翅充值的方法

delphi7 rename出错解决方法