欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

数据分析Seaborn常用画图方式汇总|多种方法画图找趋势、找关系、找分布|20 mins速成|Kaggle 学习(一)

程序员文章站 2022-06-15 23:47:06
...

数据分析Seaborn常用画图方式汇总|多种方法画图找趋势、找关系、找分布|20 mins速成|Kaggle 学习(一)

  • Trends

    - A trend is defined as a pattern of change.

    • sns.lineplot - Line charts are best to show trends over a period of time, and multiple lines can be used to show trends in more than one group.
  • Relationship

    - There are many different chart types that you can use to understand relationships between variables in your data.

    • sns.barplot - Bar charts are useful for comparing quantities corresponding to different groups.
    • sns.heatmap - Heatmaps can be used to find color-coded patterns in tables of numbers.
    • sns.scatterplot - Scatter plots show the relationship between two continuous variables; if color-coded, we can also show the relationship with a third categorical variable.
    • sns.regplot - Including a regression line in the scatter plot makes it easier to see any linear relationship between two variables.
    • sns.lmplot - This command is useful for drawing multiple regression lines, if the scatter plot contains multiple, color-coded groups.
    • sns.swarmplot - Categorical scatter plots show the relationship between a continuous variable and a categorical variable.
  • Distribution

    - We visualize distributions to show the possible values that we can expect to see in a variable, along with how likely they are.

    • sns.distplot - Histograms show the distribution of a single numerical variable.
    • sns.kdeplot - KDE plots (or 2D KDE plots) show an estimated, smooth distribution of a single numerical variable (or two numerical variables).
    • sns.jointplot - This command is useful for simultaneously displaying a 2D KDE plot with the corresponding KDE plots for each individual variable.

1. Line Chart

import pandas as pd
pd.plotting.register_matplotlib_converters()
import matplotlib,pyplot as plt
%matplotlib inline
import seaborn as sns
# Path of the file to read
spotify_filepath = "../input/spotify.csv"

# Read the file into a variable spotify_data
spotify_data = pd.read_csv(spotify_filepath, index_col="Date", parse_dates=True)

spotify_data.tail()
Shape of You Despacito Something Just Like This HUMBLE. Unforgettable
Date
2018-01-05 4492978 3450315.0 2408365.0 2685857.0 2869783.0
2018-01-06 4416476 3394284.0 2188035.0 2559044.0 2743748.0
2018-01-07 4009104 3020789.0 1908129.0 2350985.0 2441045.0
2018-01-08 4135505 2755266.0 2023251.0 2523265.0 2622693.0
2018-01-09 4168506 2791601.0 2058016.0 2727678.0 2627334.0
# Line chart showing daily global streams of each song
sns.lineplot(data=spotify_data)
<matplotlib.axes._subplots.AxesSubplot at 0x7fc8b2bb6f98>

数据分析Seaborn常用画图方式汇总|多种方法画图找趋势、找关系、找分布|20 mins速成|Kaggle 学习(一)

# Set the width and height of the figure
# sets the size of the figure to 14 inches (in width) by 6 inches (in height)
plt.figure(figsize=(14, 6))

# Add title
plt.title("Daily Global Streams of Popular Songs in 2017-2018")

# Line chart showing daily global streams of each song
sns.lineplot(data=spotify_data)
<matplotlib.axes._subplots.AxesSubplot at 0x7fc8b2a74780>

数据分析Seaborn常用画图方式汇总|多种方法画图找趋势、找关系、找分布|20 mins速成|Kaggle 学习(一)

Changing styles

# Seaborn has five different themes:(1)"darkgrid", (2)"whitegrid", (3)"dark", (4)"white", and (5)"ticks"
# Change the style of the figure to the "dark" theme
sns.set_style("dark")

# Line chart 
plt.figure(figsize=(12,6))
sns.lineplot(data=spotify_data)
<matplotlib.axes._subplots.AxesSubplot at 0x7f5faa4bc828>

数据分析Seaborn常用画图方式汇总|多种方法画图找趋势、找关系、找分布|20 mins速成|Kaggle 学习(一)

Plot a subset of the data

# Set the width and height of the figure
plt.figure(figsize=(14,6))

# Add title
plt.title("Daily Global Streams of Popular Songs in 2017-2018")

# Line chart showing daily global streams of 'Shape of You'
sns.lineplot(data=spotify_data['Shape of You'], label="Shape of You")

# Line chart showing daily global streams of 'Despacito'
sns.lineplot(data=spotify_data['Despacito'], label="Despacito")

# Add label for horizontal axis
plt.xlabel("Date")

数据分析Seaborn常用画图方式汇总|多种方法画图找趋势、找关系、找分布|20 mins速成|Kaggle 学习(一)

2.Bar Charts

# Print the data
flight_data
AA AS B6 DL EV F9 HA MQ NK OO UA US VX WN
Month
1 6.955843 -0.320888 7.347281 -2.043847 8.537497 18.357238 3.512640 18.164974 11.398054 10.889894 6.352729 3.107457 1.420702 3.389466
2 7.530204 -0.782923 18.657673 5.614745 10.417236 27.424179 6.029967 21.301627 16.474466 9.588895 7.260662 7.114455 7.784410 3.501363
3 6.693587 -0.544731 10.741317 2.077965 6.730101 20.074855 3.468383 11.018418 10.039118 3.181693 4.892212 3.330787 5.348207 3.263341
4 4.931778 -3.009003 2.780105 0.083343 4.821253 12.640440 0.011022 5.131228 8.766224 3.223796 4.376092 2.660290 0.995507 2.996399
5 5.173878 -1.716398 -0.709019 0.149333 7.724290 13.007554 0.826426 5.466790 22.397347 4.141162 6.827695 0.681605 7.102021 5.680777
6 8.191017 -0.220621 5.047155 4.419594 13.952793 19.712951 0.882786 9.639323 35.561501 8.338477 16.932663 5.766296 5.779415 10.743462
7 3.870440 0.377408 5.841454 1.204862 6.926421 14.464543 2.001586 3.980289 14.352382 6.790333 10.262551 NaN 7.135773 10.504942
8 3.193907 2.503899 9.280950 0.653114 5.154422 9.175737 7.448029 1.896565 20.519018 5.606689 5.014041 NaN 5.106221 5.532108
9 -1.432732 -1.813800 3.539154 -3.703377 0.851062 0.978460 3.696915 -2.167268 8.000101 1.530896 -1.794265 NaN 0.070998 -1.336260
10 -0.580930 -2.993617 3.676787 -5.011516 2.303760 0.082127 0.467074 -3.735054 6.810736 1.750897 -2.456542 NaN 2.254278 -0.688851
11 0.772630 -1.916516 1.418299 -3.175414 4.415930 11.164527 -2.719894 0.220061 7.543881 4.925548 0.281064 NaN 0.116370 0.995684
12 4.149684 -1.846681 13.839290 2.504595 6.685176 9.346221 -1.706475 0.662486 12.733123 10.947612 7.012079 NaN 13.498720 6.720893
# Set the width and height of the figure
plt.figure(figsize=(10,6))

# Add title
plt.title("Average Arrival Delay for Spirit Airlines Flights, by Month")

# Bar chart showing average arrival delay for Spirit Airlines flights by month
sns.barplot(x=flight_data.index, y=flight_data['NK'])

# Add label for vertical axis
plt.ylabel("Arrival delay (in minutes)")
Text(0, 0.5, 'Arrival delay (in minutes)')

数据分析Seaborn常用画图方式汇总|多种方法画图找趋势、找关系、找分布|20 mins速成|Kaggle 学习(一)

3.Heatmap

# Set the width and height of the figure
plt.figure(figsize=(14,7))

# Add title
plt.title("Average Arrival Delay for Each Airline, by Month")

# Heatmap showing average arrival delay for each airline by month
sns.heatmap(data=flight_data, annot=True)

# Add label for horizontal axis
plt.xlabel("Airline")
Text(0.5, 42.0, 'Airline')

数据分析Seaborn常用画图方式汇总|多种方法画图找趋势、找关系、找分布|20 mins速成|Kaggle 学习(一)

4.Scatter Plots

insurance_data.head()
age sex bmi children smoker region charges
0 19 female 27.900 0 yes southwest 16884.92400
1 18 male 33.770 1 no southeast 1725.55230
2 28 male 33.000 3 no southeast 4449.46200
3 33 male 22.705 0 no northwest 21984.47061
4 32 male 28.880 0 no northwest 3866.85520
sns.scatterplot(x=insurance_data['bmi'], y=insurance_data['charges'])
<matplotlib.axes._subplots.AxesSubplot at 0x7f44f2300048>

数据分析Seaborn常用画图方式汇总|多种方法画图找趋势、找关系、找分布|20 mins速成|Kaggle 学习(一)

# Add a regression line
sns.regplot(x=insurance_data['bmi'], y=insurance_data['charges'])
<matplotlib.axes._subplots.AxesSubplot at 0x7f44f222c588>

数据分析Seaborn常用画图方式汇总|多种方法画图找趋势、找关系、找分布|20 mins速成|Kaggle 学习(一)

Color-coded scatter plots

# color-code the points by 'smoker', plot the other two columns('bmi', 'charges') on the axes 
sns.scatterplot(x=insurance_data['bmi'], y=insurance_data['charges'], hue=insurance_data['smoker'])
<matplotlib.axes._subplots.AxesSubplot at 0x7f44f19b49e8>

数据分析Seaborn常用画图方式汇总|多种方法画图找趋势、找关系、找分布|20 mins速成|Kaggle 学习(一)

# add two regression lines, corresponding to smokers and nonsmokers
# Instead of setting x=insurance_data['bmi'] to select the 'bmi' column in insurance_data, we set x="bmi" to specify the name of the column only.
# Similarly, y="charges" and hue="smoker" also contain the names of columns.
# We specify the dataset with data=insurance_data.

sns.lmplot(x="bmi", y="charges", hue="smoker", data=insurance_data)
<seaborn.axisgrid.FacetGrid at 0x7f44f192d668>

数据分析Seaborn常用画图方式汇总|多种方法画图找趋势、找关系、找分布|20 mins速成|Kaggle 学习(一)

sns.swarmplot(x=insurance_data['smoker'],
							y=insurance_data['charges'])

数据分析Seaborn常用画图方式汇总|多种方法画图找趋势、找关系、找分布|20 mins速成|Kaggle 学习(一)

5.Histograms

Sepal Length (cm) Sepal Width (cm) Petal Length (cm) Petal Width (cm) Species
Id
1 5.1 3.5 1.4 0.2 Iris-setosa
2 4.9 3.0 1.4 0.2 Iris-setosa
3 4.7 3.2 1.3 0.2 Iris-setosa
4 4.6 3.1 1.5 0.2 Iris-setosa
5 5.0 3.6 1.4 0.2 Iris-setosa
# 'a' chooses the columns of the data
# kde=False is something we'll always provide when creating a histogram, as leaving it out will create a slightly different plot.
sns.displot(a=iris_data['Petal Length(cm)'], kde=False)
<matplotlib.axes._subplots.AxesSubplot at 0x7f96c5b1da20>

数据分析Seaborn常用画图方式汇总|多种方法画图找趋势、找关系、找分布|20 mins速成|Kaggle 学习(一)

Color-coded plots

# Histograms for each species
sns.distplot(a=iris_set_data['Petal Length (cm)'], label="Iris-setosa", kde=False)
sns.distplot(a=iris_ver_data['Petal Length (cm)'], label="Iris-versicolor", kde=False)
sns.distplot(a=iris_vir_data['Petal Length (cm)'], label="Iris-virginica", kde=False)

# Add title
plt.title("Histogram of Petal Lengths, by Species")

# Force legend to appear
plt.legend()
<matplotlib.legend.Legend at 0x7f96c5849470>

数据分析Seaborn常用画图方式汇总|多种方法画图找趋势、找关系、找分布|20 mins速成|Kaggle 学习(一)

6. Density plots

# Kernel density estimate(KDE) plot is like as a smoothed histogram
# 'shade=True' colors the area below the curve
sns.kdeplot(data=iris_data['Petal Length (cm)'], shade=True)
<matplotlib.axes._subplots.AxesSubplot at 0x7f96c5a664e0>

数据分析Seaborn常用画图方式汇总|多种方法画图找趋势、找关系、找分布|20 mins速成|Kaggle 学习(一)

# 2D KDE plot
sns.jointplot(x=iris_data['Petal Length (cm)'], y=iris_data['Sepal Width (cm)'], kind="kde")
<seaborn.axisgrid.JointGrid at 0x7f96c59cbef0>

The color-coding shows us how likely we are to see different combinations of sepal width and petal length, where darker parts of the figure are more likely.
数据分析Seaborn常用画图方式汇总|多种方法画图找趋势、找关系、找分布|20 mins速成|Kaggle 学习(一)

  • the curve at the top of the figure is a KDE plot for the data on the x-axis (in this case, iris_data['Petal Length (cm)']), and
  • the curve on the right of the figure is a KDE plot for the data on the y-axis (in this case, iris_data['Sepal Width (cm)']).

Color-coded plots

# KDE plots for each species
sns.kdeplot(data=iris_set_data['Petal Length (cm)'], label="Iris-setosa", shade=True)
sns.kdeplot(data=iris_ver_data['Petal Length (cm)'], label="Iris-versicolor", shade=True)
sns.kdeplot(data=iris_vir_data['Petal Length (cm)'], label="Iris-virginica", shade=True)

# Add title
plt.title("Distribution of Petal Lengths, by Species")
Text(0.5, 1.0, 'Distribution of Petal Lengths, by Species')

数据分析Seaborn常用画图方式汇总|多种方法画图找趋势、找关系、找分布|20 mins速成|Kaggle 学习(一)

相关标签: 工具效率