Python学习之数据可视化

程序员文章站 2022-07-14 16:35:43

...

常用Python包

Matplotlib
Seaborn
Pandas
Bokeh
Plotly
Vispy
Vega
gaga-lite

Matplotlib可视化

Matplotlib安装

pip install matplotlib-i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com

如果失败了可以试试这样：
先更新pip，在安装matplotlib

python -m pip install -U pip setuptools -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com
python -m pip install matplotlib -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com

Matplotlib包括两个模板

绘图API：pyplot，通常用于可视化
集成库：pylab，是Matplotlib和SciPy、NumPy的集成库

Matplotlib绘图的两种方式

inline，静态绘图
notebook，交互式图

在二维坐标上绘图plt.plot()
plt.show()显示结果

import matplotlib.pyplot as plt
import pandas as pd

dt = {'height': pd.Series([58, 59, 60, 61, 62], index=[0, 1, 2, 3, 4]),
      'weight': pd.Series([115, 117, 120, 123, 126], index=[0, 1, 2, 3, 4])}
women = pd.DataFrame(dt)
print(women)
'''
   height  weight
0      58     115
1      59     117
2      60     120
3      61     123
4      62     126
'''
plt.plot(women["height"],women["weight"])
plt.show()

Python学习之数据可视化
实现显示多条线条的方法plt.plot(x,y1,x,y2,x,y3…)

import matplotlib.pyplot as plt
import numpy as np

t = np.arange(0.0, 4.0, 0.1)
print(t)
plt.plot(t, t, t, t + 2, t, t ** 2, t, t + 8)
plt.show()

Python学习之数据可视化

改变图的属性

设置点的类型
在plt.plot()中增加第三个实参的取值，如‘o’

import matplotlib.pyplot as plt
import pandas as pd

dt = {'height': pd.Series([58, 59, 60, 61, 62], index=[0, 1, 2, 3, 4]),
      'weight': pd.Series([115, 117, 120, 123, 126], index=[0, 1, 2, 3, 4])}
women = pd.DataFrame(dt)
print(women)
'''
   height  weight
0      58     115
1      59     117
2      60     120
3      61     123
4      62     126
'''
plt.plot(women["height"],women["weight"],'o')
plt.show()
plt.plot(women["height"],women["weight"],'D')
plt.show()

Python学习之数据可视化

设置线的颜色和形状
改变plt.plot()的第三个实参

import matplotlib.pyplot as plt
import pandas as pd

dt = {'height': pd.Series([58, 59, 60, 61, 62], index=[0, 1, 2, 3, 4]),
      'weight': pd.Series([115, 117, 120, 123, 126], index=[0, 1, 2, 3, 4])}
women = pd.DataFrame(dt)
print(women)
'''
   height  weight
0      58     115
1      59     117
2      60     120
3      61     123
4      62     126
'''
plt.plot(women["height"],women["weight"],'g--')
plt.show()
plt.plot(women["height"],women["weight"],'rD')
plt.show()

Python学习之数据可视化

具体用法可以参考这两篇

https://blog.csdn.net/cjcrxzz/article/details/79627483
https://blog.csdn.net/sinat_36219858/article/details/79800460?utm_source=distribute.pc_relevant.none-task

显示汉字

放在plot前
汉字常用字体:SimHei、Kaiti、Lisu、Fangsong、YouYuan

plt.rcParams['font.family'] = 'SimHei'

设置图名以及x/y轴名称

plt.title()、plt.xlabel()、plt.ylabel()分别为图的标题、x坐标名和y坐标名

import matplotlib.pyplot as plt
import pandas as pd

dt = {'height': pd.Series([58, 59, 60, 61, 62], index=[0, 1, 2, 3, 4]),
      'weight': pd.Series([115, 117, 120, 123, 126], index=[0, 1, 2, 3, 4])}
women = pd.DataFrame(dt)
print(women)
'''
   height  weight
0      58     115
1      59     117
2      60     120
3      61     123
4      62     126
'''
plt.rcParams['font.family'] = 'SimHei'
plt.plot(women["height"], women["weight"], 'g--')
plt.title("此处为图名")
plt.xlabel("x轴的名称")
plt.ylabel("y轴的名称")
plt.show()

Python学习之数据可视化

图例的位置
首先在plt.plot()加上label参数，再使用plt.legend(loc = )loc为位置，可设置为如"upper left"。显示的是图例，即lebel的内容

import matplotlib.pyplot as plt
import pandas as pd

dt = {'height': pd.Series([58, 59, 60, 61, 62], index=[0, 1, 2, 3, 4]),
      'weight': pd.Series([115, 117, 120, 123, 126], index=[0, 1, 2, 3, 4])}
women = pd.DataFrame(dt)
print(women)
'''
   height  weight
0      58     115
1      59     117
2      60     120
3      61     123
4      62     126
'''
plt.rcParams['font.family'] = 'SimHei'
plt.plot(women["height"], women["weight"], 'g--', label='weight')
plt.title("此处为图名")
plt.xlabel("x轴的名称")
plt.ylabel("y轴的名称")

plt.legend(loc="upper left")
plt.show()

Python学习之数据可视化

改变图的类型

plt.scatter()散点图

import matplotlib.pyplot as plt
import pandas as pd

dt = {'height': pd.Series([58, 59, 60, 61, 62], index=[0, 1, 2, 3, 4]),
      'weight': pd.Series([115, 117, 120, 123, 126], index=[0, 1, 2, 3, 4])}
women = pd.DataFrame(dt)
print(women)
'''
   height  weight
0      58     115
1      59     117
2      60     120
3      61     123
4      62     126
'''
plt.scatter(women["height"], women["weight"])
plt.show()

Python学习之数据可视化

改变图的坐标轴的取值范围

定义横坐标:plt.xlim()
定义纵坐标:plt.ylim()
同时定义横、纵坐标:plt.axis()
np.linspace(0,10,100)功能为返回一个含有100个元素且每个元素取值范围为[0,100]的等距离数列

import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 10, 100)
plt.plot(x, np.sin(x))
plt.xlim(11, -2)  # x轴取值范围为[11,-2]
plt.ylim(2.2, -1.3)  # y轴取值范围为[2.2,-1.3]
plt.show()

Python学习之数据可视化
plt.axis(a1,a2,b1,b2)：a1和a2为x轴的取值范围，b1和b2为y轴的取值范围

import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 10, 100)
plt.plot(x, np.sin(x))
plt.axis([-1, 21, -1.6, 1.6])
plt.show()

Python学习之数据可视化
plt.axis("equal’)x轴和y轴的刻度单位一样

import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 10, 100)
plt.plot(x, np.sin(x))
plt.axis("equal")
plt.show()

Python学习之数据可视化

去掉边界的空白

plt.axis(“tight”)

import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 10, 100)
plt.plot(x, np.sin(x))
plt.axis("tight")
plt.show()

Python学习之数据可视化

在同一个坐标上画两个图

定义多个plt.plot()

import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 10, 100)
plt.plot(x, np.sin(x),label="sin(x)")
plt.plot(x, np.cos(x),label="cos(x)")
plt.axis("tight")
plt.legend()
plt.show()

Python学习之数据可视化

多图显示

plt.subplot(x,y,z)表示的是接下面的图显示位置是x*y个窗口的第z个窗口

import matplotlib.pyplot as plt
import pandas as pd

dt = {'height': pd.Series([58, 59, 60, 61, 62], index=[0, 1, 2, 3, 4]),
      'weight': pd.Series([115, 117, 120, 123, 126], index=[0, 1, 2, 3, 4])}
women = pd.DataFrame(dt)
print(women)
'''
   height  weight
0      58     115
1      59     117
2      60     120
3      61     123
4      62     126
'''
plt.subplot(2, 3, 5)  # 2*3个窗口的第5个窗口
plt.scatter(women["height"], women["weight"])
plt.subplot(2, 3, 1)  # 2*3个窗口的第1个窗口
plt.scatter(women["height"], women["weight"])
plt.show()

Python学习之数据可视化

图的保存

将plt.show()替换为plt.savefig(“图片名称.图片格式”)
保存在当前工作目录

import matplotlib.pyplot as plt
import pandas as pd

dt = {'height': pd.Series([58, 59, 60, 61, 62], index=[0, 1, 2, 3, 4]),
      'weight': pd.Series([115, 117, 120, 123, 126], index=[0, 1, 2, 3, 4])}
women = pd.DataFrame(dt)
print(women)
'''
   height  weight
0      58     115
1      59     117
2      60     120
3      61     123
4      62     126
'''
plt.subplot(2, 3, 5)  # 2*3个窗口的第5个窗口
plt.scatter(women["height"], women["weight"])
plt.subplot(2, 3, 1)  # 2*3个窗口的第1个窗口
plt.scatter(women["height"], women["weight"])
plt.savefig("sagefig.png")

Python学习之数据可视化

散点图的画法

sklearn模块下载

pip install sklearn -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com

make_blobs：生成符合正态分布的随机数据集
参数：

n_samples：样本数量，即行数
n_features：每个样本的特征数量，即列数
centers：类别数
random_state：随机数的生成方式
cluster_std：每个类别的方差

返回值：

X：测试集，类型为数组，形状为[n_samples,n_features]
y：每个成员的标签(label)，也是个数组，形状为[n_samples]的数组

plt.scatter()的参数

X[:,0]和X[:,1]分别为x坐标和y坐标
c为颜色
s为点的大小
cmap为色带，是c的补充

from sklearn.datasets.samples_generator import make_blobs
import matplotlib.pyplot as plt

X, y = make_blobs(n_samples=300, centers=4, random_state=0, cluster_std=1.0)
plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap="rainbow")
plt.show()

Python学习之数据可视化

Pandas可视化

Pandas的画图函数，使得DataFrame类的数据可视化更加容易
Pandas的plot(kind=)参数决定了图的类别

import matplotlib.pyplot as plt
import pandas as pd

dt = {'height': pd.Series([58, 59, 60, 61, 62], index=[0, 1, 2, 3, 4]),
      'weight': pd.Series([115, 117, 120, 123, 126], index=[0, 1, 2, 3, 4])}
women = pd.DataFrame(dt)
print(women)
'''
   height  weight
0      58     115
1      59     117
2      60     120
3      61     123
4      62     126
'''
women.plot(kind="bar")
plt.show()

Python学习之数据可视化
barh代表的是横向柱状图

import matplotlib.pyplot as plt
import pandas as pd

dt = {'height': pd.Series([58, 59, 60, 61, 62], index=[0, 1, 2, 3, 4]),
      'weight': pd.Series([115, 117, 120, 123, 126], index=[0, 1, 2, 3, 4])}
women = pd.DataFrame(dt)
print(women)
'''
   height  weight
0      58     115
1      59     117
2      60     120
3      61     123
4      62     126
'''
women.plot(kind="barh")
plt.show()

Python学习之数据可视化

import matplotlib.pyplot as plt
import pandas as pd

dt = {'height': pd.Series([58, 59, 60, 61, 62], index=[0, 1, 2, 3, 4]),
      'weight': pd.Series([115, 117, 120, 123, 126], index=[0, 1, 2, 3, 4])}
women = pd.DataFrame(dt)
print(women)
'''
   height  weight
0      58     115
1      59     117
2      60     120
3      61     123
4      62     126
'''
women.plot(kind="bar", x="height", y="weight", color='g')
plt.show()

Python学习之数据可视化
kde表示为核密度估计曲线

import matplotlib.pyplot as plt
import pandas as pd

dt = {'height': pd.Series([58, 59, 60, 61, 62], index=[0, 1, 2, 3, 4]),
      'weight': pd.Series([115, 117, 120, 123, 126], index=[0, 1, 2, 3, 4])}
women = pd.DataFrame(dt)
print(women)
'''
   height  weight
0      58     115
1      59     117
2      60     120
3      61     123
4      62     126
'''
women.plot(kind="kde")
plt.show()

Python学习之数据可视化

plt.legend(loc=“best”)使图例位置最优

import matplotlib.pyplot as plt
import pandas as pd

dt = {'height': pd.Series([58, 59, 60, 61, 62], index=[0, 1, 2, 3, 4]),
      'weight': pd.Series([115, 117, 120, 123, 126], index=[0, 1, 2, 3, 4])}
women = pd.DataFrame(dt)
print(women)
'''
   height  weight
0      58     115
1      59     117
2      60     120
3      61     123
4      62     126
'''
women.plot(kind="bar", x="height", y="weight", color='g')
plt.legend(loc="best")
plt.show()

Python学习之数据可视化

Seaborn可视化

cumsum为Matlab中的一个函数，通常用于计算一个数组各行的累加值，语法为：B = cumsum(A,dim)，或B = cumsum(A)
plt.legend()的功能为设置图例参数

图例内容:abcdef
图例列数:ncol = 2
图例的显示位置:loc = “upper left”

import matplotlib.pyplot as plt
import numpy as np

plt.style.use("classic")
Rng = np.random.RandomState(0)
X = np.linspace(0, 10, 500) # 生成500个0~10之间的数
y = np.cumsum(Rng.randn(500, 6), 0)
plt.plot(X, y)
plt.legend("abcdef", ncol=2, loc="upper left")
plt.show()

Python学习之数据可视化
Seaborn下载

pip install seaborn -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com

加上Seaborn可以使图形更加美观

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

plt.style.use("classic")
Rng = np.random.RandomState(0)
X = np.linspace(0, 10, 500)
y = np.cumsum(Rng.randn(500, 6), 0)
sns.set()
plt.plot(X, y)
plt.legend("abcdef", ncol=2, loc="upper left")
plt.show()

Python学习之数据可视化

核密度估计图(KDE)

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

dt = {'height': pd.Series([58, 59, 60, 61, 62], index=[0, 1, 2, 3, 4]),
      'weight': pd.Series([115, 117, 120, 123, 126], index=[0, 1, 2, 3, 4])}
women = pd.DataFrame(dt)
print(women)
'''
   height  weight
0      58     115
1      59     117
2      60     120
3      61     123
4      62     126
'''
sns.kdeplot(women.height,shade=True)
plt.show()

Python学习之数据可视化
sns.distplot()绘制displot图，功能为直方图+kdeplot

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

dt = {'height': pd.Series([58, 59, 60, 61, 62], index=[0, 1, 2, 3, 4]),
      'weight': pd.Series([115, 117, 120, 123, 126], index=[0, 1, 2, 3, 4])}
women = pd.DataFrame(dt)
print(women)
'''
   height  weight
0      58     115
1      59     117
2      60     120
3      61     123
4      62     126
'''
sns.distplot(women.height)
plt.show()

Python学习之数据可视化
sns.pairplot()：散点图矩阵

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

dt = {'height': pd.Series([58, 59, 60, 61, 62], index=[0, 1, 2, 3, 4]),
      'weight': pd.Series([115, 117, 120, 123, 126], index=[0, 1, 2, 3, 4])}
women = pd.DataFrame(dt)
print(women)
'''
   height  weight
0      58     115
1      59     117
2      60     120
3      61     123
4      62     126
'''
sns.pairplot(women)
plt.show()

Python学习之数据可视化

sns.jointplot()联合分布图

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

dt = {'height': pd.Series([58, 59, 60, 61, 62], index=[0, 1, 2, 3, 4]),
      'weight': pd.Series([115, 117, 120, 123, 126], index=[0, 1, 2, 3, 4])}
women = pd.DataFrame(dt)
print(women)
'''
   height  weight
0      58     115
1      59     117
2      60     120
3      61     123
4      62     126
'''
sns.jointplot(women.height, women.weight, kind="reg")
plt.show()

Python学习之数据可视化
用with同样可以改变参数，注意要加:，同时注意缩进

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

dt = {'height': pd.Series([58, 59, 60, 61, 62], index=[0, 1, 2, 3, 4]),
      'weight': pd.Series([115, 117, 120, 123, 126], index=[0, 1, 2, 3, 4])}
women = pd.DataFrame(dt)
print(women)
'''
   height  weight
0      58     115
1      59     117
2      60     120
3      61     123
4      62     126
'''
with sns.axes_style("white"):
    sns.jointplot(women.height, women.weight, kind="reg")
plt.show()

Python学习之数据可视化

plt.hist()为绘制直方图
还可以将Seaborn放在for循环里将多个变量画在一起

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

dt = {'height': pd.Series([58, 59, 60, 61, 62], index=[0, 1, 2, 3, 4]),
      'weight': pd.Series([115, 117, 120, 123, 126], index=[0, 1, 2, 3, 4])}
women = pd.DataFrame(dt)
print(women)
'''
   height  weight
0      58     115
1      59     117
2      60     120
3      61     123
4      62     126
'''
for x in ["height", "weight"]:
    plt.hist(women[x], normed=True, alpha=0.5)
plt.show()

Python学习之数据可视化
更多Seaborn操作参考

https://www.jianshu.com/p/844f66d00ac1

数据可视化实战

数据准备

 import os
print(os.getcwd())#E:\py_workspace\test2

用pandas中的read_csv()读取到内存对象salaries中

import pandas as pd

salaries = pd.read_csv("salaries.csv", index_col=0)
# index_col=0使读取的数据文件带有索引列且索引列位于第0列

查看数据

import pandas as pd

salaries = pd.read_csv("salaries.csv", index_col=0)
# index_col=0使读取的数据文件带有索引列且索引列位于第0列
print(salaries.head())
'''
       rank discipline  yrs.since.phd  yrs.service   sex  salary
1      Prof          B             19           18  Male  139750
2      Prof          B             20           16  Male  173200
3  AsstProf          B              4            3  Male   79750
4      Prof          B             45           39  Male  115000
5      Prof          B             40           41  Male  141500
'''

导入Python包

import seaborn as sns
import matplotlib.pyplot as plt

可视化绘图

sns.set_style(‘darkgrid’)设置Seaborn的绘图样式或主题为darkgrid（灰色＋网格）
sns.stripplot()为绘制散点图
参数：

data：数据来源
x：设置x轴
y：设置y轴
jitter：是否抖动
alpha：透明度
sns.boxplot()为绘制箱线图
参数：
data：数据来源
x：设置x轴
y：设置y轴

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

salaries = pd.read_csv("salaries.csv", index_col=0)
# index_col=0使读取的数据文件带有索引列且索引列位于第0列
print(salaries.head())
'''
       rank discipline  yrs.since.phd  yrs.service   sex  salary
1      Prof          B             19           18  Male  139750
2      Prof          B             20           16  Male  173200
3  AsstProf          B              4            3  Male   79750
4      Prof          B             45           39  Male  115000
5      Prof          B             40           41  Male  141500
'''
sns.set_style('darkgrid')
sns.stripplot(data=salaries, x='rank', y='salary', jitter=True, alpha=0.5)
sns.boxplot(data=salaries, x='rank', y='salary')
plt.show()

Python学习之数据可视化

Python学习之数据可视化

Matplotlib可视化

改变图的属性

改变图的类型

改变图的坐标轴的取值范围

去掉边界的空白

在同一个坐标上画两个图

多图显示

图的保存

散点图的画法

Pandas可视化

Seaborn可视化

数据可视化实战

荐 14天数据分析与机器学习实践之Day02——数据分析处理库Pandas应用总结

数据结构学习总结（1）线性表之顺序表

干货来了！python学习之重难点整理合辑1

python数据结构之选择排序

Python之csv文件从MySQL数据库导入导出的方法

python基本数据类型之------列表

Python学习之文件操作

Python3基础之基本数据类型概述

Python学习手册之元组拆包、三元运算符和 else 语句深入

python数据分析之Numpy