欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  IT编程

大佬整理的Python数据可视化时间序列案例,建议收藏(附代码)

程序员文章站 2022-04-05 23:02:17
前言 本文的文字及图片来源于网络,仅供学习、交流使用,不具有任何商业用途,版权归原作者所有,如有问题请及时联系我们以作处理。 时间序列 1、时间序列图 时间序列图用于可视化给定指标如何随时间变化。在这里,您可以了解1949年至1969年之间的航空客运流量如何变化。 # Import Data df ......

前言

本文的文字及图片来源于网络,仅供学习、交流使用,不具有任何商业用途,版权归原作者所有,如有问题请及时联系我们以作处理。

 

时间序列

 

1、时间序列图

时间序列图用于可视化给定指标如何随时间变化。在这里,您可以了解1949年至1969年之间的航空客运流量如何变化。

大佬整理的Python数据可视化时间序列案例,建议收藏(附代码)
# import data
df = pd.read_csv('https://github.com/selva86/datasets/raw/master/airpassengers.csv')

# draw plot
plt.figure(figsize=(16,10), dpi= 80)
plt.plot('date', 'traffic', data=df, color='tab:red')

# decoration
plt.ylim(50, 750)
xtick_location = df.index.tolist()[::12]
xtick_labels = [x[-4:] for x in df.date.tolist()[::12]]
plt.xticks(ticks=xtick_location, labels=xtick_labels, rotation=0, fontsize=12, horizontalalignment='center', alpha=.7)
plt.yticks(fontsize=12, alpha=.7)
plt.title("air passengers traffic (1949 - 1969)", fontsize=22)
plt.grid(axis='both', alpha=.3)

# remove borders
plt.gca().spines["top"].set_alpha(0.0)    
plt.gca().spines["bottom"].set_alpha(0.3)
plt.gca().spines["right"].set_alpha(0.0)    
plt.gca().spines["left"].set_alpha(0.3)   
plt.show()

 

2、带有标记的时间序列图

下面的时间序列绘制了所有的波峰和波谷,并注释了选定特殊事件的发生。

大佬整理的Python数据可视化时间序列案例,建议收藏(附代码)

 

# import data
df = pd.read_csv('https://github.com/selva86/datasets/raw/master/airpassengers.csv')

# get the peaks and troughs
data = df['traffic'].values
doublediff = np.diff(np.sign(np.diff(data)))
peak_locations = np.where(doublediff == -2)[0] + 1

doublediff2 = np.diff(np.sign(np.diff(-1*data)))
trough_locations = np.where(doublediff2 == -2)[0] + 1

# draw plot
plt.figure(figsize=(16,10), dpi= 80)
plt.plot('date', 'traffic', data=df, color='tab:blue', label='air traffic')
plt.scatter(df.date[peak_locations], df.traffic[peak_locations], marker=mpl.markers.caretupbase, color='tab:green', s=100, label='peaks')
plt.scatter(df.date[trough_locations], df.traffic[trough_locations], marker=mpl.markers.caretdownbase, color='tab:red', s=100, label='troughs')

# annotate
for t, p in zip(trough_locations[1::5], peak_locations[::3]):
    plt.text(df.date[p], df.traffic[p]+15, df.date[p], horizontalalignment='center', color='darkgreen')
    plt.text(df.date[t], df.traffic[t]-35, df.date[t], horizontalalignment='center', color='darkred')

# decoration
plt.ylim(50,750)
xtick_location = df.index.tolist()[::6]
xtick_labels = df.date.tolist()[::6]
plt.xticks(ticks=xtick_location, labels=xtick_labels, rotation=90, fontsize=12, alpha=.7)
plt.title("peak and troughs of air passengers traffic (1949 - 1969)", fontsize=22)
plt.yticks(fontsize=12, alpha=.7)

# lighten borders
plt.gca().spines["top"].set_alpha(.0)
plt.gca().spines["bottom"].set_alpha(.3)
plt.gca().spines["right"].set_alpha(.0)
plt.gca().spines["left"].set_alpha(.3)

plt.legend(loc='upper left')
plt.grid(axis='y', alpha=.3)
plt.show()

 

3、自相关(acf)和部分自相关(pacf)图

acf图显示了时间序列与其自身滞后的相关性。每条垂直线(在自相关图上)代表序列与从滞后0开始的滞后之间的相关性。图中的蓝色阴影区域是显着性水平。蓝线以上的那些滞后就是巨大的滞后。

那么如何解释呢?

对于airpassengers,我们看到多达14个滞后已越过蓝线,因此意义重大。这意味着,距今已有14年之久的航空客运量对今天的客运量产生了影响。

另一方面,pacf显示了任何给定的(时间序列)滞后与当前序列之间的自相关,但是去除了两者之间的滞后。

大佬整理的Python数据可视化时间序列案例,建议收藏(附代码)

 

 

# import data
df = pd.read_csv("https://github.com/selva86/datasets/raw/master/economics.csv")

x = df['date']
y1 = df['psavert']
y2 = df['unemploy']

# plot line1 (left y axis)
fig, ax1 = plt.subplots(1,1,figsize=(16,9), dpi= 80)
ax1.plot(x, y1, color='tab:red')

# plot line2 (right y axis)
ax2 = ax1.twinx()  # instantiate a second axes that shares the same x-axis
ax2.plot(x, y2, color='tab:blue')

# decorations
# ax1 (left y axis)
ax1.set_xlabel('year', fontsize=20)
ax1.tick_params(axis='x', rotation=0, labelsize=12)
ax1.set_ylabel('personal savings rate', color='tab:red', fontsize=20)
ax1.tick_params(axis='y', rotation=0, labelcolor='tab:red' )
ax1.grid(alpha=.4)

# ax2 (right y axis)
ax2.set_ylabel("# unemployed (1000's)", color='tab:blue', fontsize=20)
ax2.tick_params(axis='y', labelcolor='tab:blue')
ax2.set_xticks(np.arange(0, len(x), 60))
ax2.set_xticklabels(x[::60], rotation=90, fontdict={'fontsize':10})
ax2.set_title("personal savings rate vs unemployed: plotting in secondary y axis", fontsize=22)
fig.tight_layout()
plt.show()

 

4、交叉相关图

互相关图显示了两个时间序列之间的时滞。

大佬整理的Python数据可视化时间序列案例,建议收藏(附代码)

 

 

from scipy.stats import sem

# import data
df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/user_orders_hourofday.csv")
df_mean = df.groupby('order_hour_of_day').quantity.mean()
df_se = df.groupby('order_hour_of_day').quantity.apply(sem).mul(1.96)

# plot
plt.figure(figsize=(16,10), dpi= 80)
plt.ylabel("# orders", fontsize=16)  
x = df_mean.index
plt.plot(x, df_mean, color="white", lw=2) 
plt.fill_between(x, df_mean - df_se, df_mean + df_se, color="#3f5d7d")  

# decorations
# lighten borders
plt.gca().spines["top"].set_alpha(0)
plt.gca().spines["bottom"].set_alpha(1)
plt.gca().spines["right"].set_alpha(0)
plt.gca().spines["left"].set_alpha(1)
plt.xticks(x[::2], [str(d) for d in x[::2]] , fontsize=12)
plt.title("user orders by hour of day (95% confidence)", fontsize=22)
plt.xlabel("hour of day")

s, e = plt.gca().get_xlim()
plt.xlim(s, e)

# draw horizontal tick lines  
for y in range(8, 20, 2):    
    plt.hlines(y, xmin=s, xmax=e, colors='black', alpha=0.5, linestyles="--", lw=0.5)

plt.show()

 

5、时间序列分解图

时间序列分解图显示了时间序列按趋势,季节和残差成分的分解。

大佬整理的Python数据可视化时间序列案例,建议收藏(附代码)

 

 

"data source: https://www.kaggle.com/olistbr/brazilian-ecommerce#olist_orders_dataset.csv"
from dateutil.parser import parse
from scipy.stats import sem

# import data
df_raw = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/orders_45d.csv', 
                     parse_dates=['purchase_time', 'purchase_date'])

# prepare data: daily mean and se bands
df_mean = df_raw.groupby('purchase_date').quantity.mean()
df_se = df_raw.groupby('purchase_date').quantity.apply(sem).mul(1.96)

# plot
plt.figure(figsize=(16,10), dpi= 80)
plt.ylabel("# daily orders", fontsize=16)  
x = [d.date().strftime('%y-%m-%d') for d in df_mean.index]
plt.plot(x, df_mean, color="white", lw=2) 
plt.fill_between(x, df_mean - df_se, df_mean + df_se, color="#3f5d7d")  

# decorations
# lighten borders
plt.gca().spines["top"].set_alpha(0)
plt.gca().spines["bottom"].set_alpha(1)
plt.gca().spines["right"].set_alpha(0)
plt.gca().spines["left"].set_alpha(1)
plt.xticks(x[::6], [str(d) for d in x[::6]] , fontsize=12)
plt.title("daily order quantity of brazilian retail with error bands (95% confidence)", fontsize=20)

# axis limits
s, e = plt.gca().get_xlim()
plt.xlim(s, e-2)
plt.ylim(4, 10)

# draw horizontal tick lines  
for y in range(5, 10, 1):    
    plt.hlines(y, xmin=s, xmax=e, colors='black', alpha=0.5, linestyles="--", lw=0.5)

plt.show()

 

6、多时间序列图

您可以在同一张图表上绘制测量同一值的多个时间序列,如下所示。

大佬整理的Python数据可视化时间序列案例,建议收藏(附代码)

 

 

"data source: https://www.kaggle.com/olistbr/brazilian-ecommerce#olist_orders_dataset.csv"
from dateutil.parser import parse
from scipy.stats import sem

# import data
df_raw = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/orders_45d.csv', 
                     parse_dates=['purchase_time', 'purchase_date'])

# prepare data: daily mean and se bands
df_mean = df_raw.groupby('purchase_date').quantity.mean()
df_se = df_raw.groupby('purchase_date').quantity.apply(sem).mul(1.96)

# plot
plt.figure(figsize=(16,10), dpi= 80)
plt.ylabel("# daily orders", fontsize=16)  
x = [d.date().strftime('%y-%m-%d') for d in df_mean.index]
plt.plot(x, df_mean, color="white", lw=2) 
plt.fill_between(x, df_mean - df_se, df_mean + df_se, color="#3f5d7d")  

# decorations
# lighten borders
plt.gca().spines["top"].set_alpha(0)
plt.gca().spines["bottom"].set_alpha(1)
plt.gca().spines["right"].set_alpha(0)
plt.gca().spines["left"].set_alpha(1)
plt.xticks(x[::6], [str(d) for d in x[::6]] , fontsize=12)
plt.title("daily order quantity of brazilian retail with error bands (95% confidence)", fontsize=20)

# axis limits
s, e = plt.gca().get_xlim()
plt.xlim(s, e-2)
plt.ylim(4, 10)

# draw horizontal tick lines  
for y in range(5, 10, 1):    
    plt.hlines(y, xmin=s, xmax=e, colors='black', alpha=0.5, linestyles="--", lw=0.5)

plt.show()

 

7、双y轴图

如果要显示在同一时间点测量两个不同量的两个时间序列,则可以在右边的第二个y轴上再次绘制第二个序列。

大佬整理的Python数据可视化时间序列案例,建议收藏(附代码)

 

 

"data source: https://www.kaggle.com/olistbr/brazilian-ecommerce#olist_orders_dataset.csv"
from dateutil.parser import parse
from scipy.stats import sem

# import data
df_raw = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/orders_45d.csv', 
                     parse_dates=['purchase_time', 'purchase_date'])

# prepare data: daily mean and se bands
df_mean = df_raw.groupby('purchase_date').quantity.mean()
df_se = df_raw.groupby('purchase_date').quantity.apply(sem).mul(1.96)

# plot
plt.figure(figsize=(16,10), dpi= 80)
plt.ylabel("# daily orders", fontsize=16)  
x = [d.date().strftime('%y-%m-%d') for d in df_mean.index]
plt.plot(x, df_mean, color="white", lw=2) 
plt.fill_between(x, df_mean - df_se, df_mean + df_se, color="#3f5d7d")  

# decorations
# lighten borders
plt.gca().spines["top"].set_alpha(0)
plt.gca().spines["bottom"].set_alpha(1)
plt.gca().spines["right"].set_alpha(0)
plt.gca().spines["left"].set_alpha(1)
plt.xticks(x[::6], [str(d) for d in x[::6]] , fontsize=12)
plt.title("daily order quantity of brazilian retail with error bands (95% confidence)", fontsize=20)

# axis limits
s, e = plt.gca().get_xlim()
plt.xlim(s, e-2)
plt.ylim(4, 10)

# draw horizontal tick lines  
for y in range(5, 10, 1):    
    plt.hlines(y, xmin=s, xmax=e, colors='black', alpha=0.5, linestyles="--", lw=0.5)

plt.show()

 

8、具有误差带的时间序列

如果您具有每个时间点(日期/时间戳)具有多个观测值的时间序列数据集,则可以构建带有误差带的时间序列。您可以在下面看到一些基于一天中不同时间下达的订单的示例。另一个例子是在45天的时间内到达的订单数量。

在这种方法中,订单数量的平均值由白线表示。然后计算出95%的置信带并围绕均值绘制。

大佬整理的Python数据可视化时间序列案例,建议收藏(附代码)

 

 

"data source: https://www.kaggle.com/olistbr/brazilian-ecommerce#olist_orders_dataset.csv"
from dateutil.parser import parse
from scipy.stats import sem

# import data
df_raw = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/orders_45d.csv', 
                     parse_dates=['purchase_time', 'purchase_date'])

# prepare data: daily mean and se bands
df_mean = df_raw.groupby('purchase_date').quantity.mean()
df_se = df_raw.groupby('purchase_date').quantity.apply(sem).mul(1.96)

# plot
plt.figure(figsize=(16,10), dpi= 80)
plt.ylabel("# daily orders", fontsize=16)  
x = [d.date().strftime('%y-%m-%d') for d in df_mean.index]
plt.plot(x, df_mean, color="white", lw=2) 
plt.fill_between(x, df_mean - df_se, df_mean + df_se, color="#3f5d7d")  

# decorations
# lighten borders
plt.gca().spines["top"].set_alpha(0)
plt.gca().spines["bottom"].set_alpha(1)
plt.gca().spines["right"].set_alpha(0)
plt.gca().spines["left"].set_alpha(1)
plt.xticks(x[::6], [str(d) for d in x[::6]] , fontsize=12)
plt.title("daily order quantity of brazilian retail with error bands (95% confidence)", fontsize=20)

# axis limits
s, e = plt.gca().get_xlim()
plt.xlim(s, e-2)
plt.ylim(4, 10)

# draw horizontal tick lines  
for y in range(5, 10, 1):    
    plt.hlines(y, xmin=s, xmax=e, colors='black', alpha=0.5, linestyles="--", lw=0.5)

plt.show()

 

 

大佬整理的Python数据可视化时间序列案例,建议收藏(附代码)

 

 

"data source: https://www.kaggle.com/olistbr/brazilian-ecommerce#olist_orders_dataset.csv"
from dateutil.parser import parse
from scipy.stats import sem

# import data
df_raw = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/orders_45d.csv', 
                     parse_dates=['purchase_time', 'purchase_date'])

# prepare data: daily mean and se bands
df_mean = df_raw.groupby('purchase_date').quantity.mean()
df_se = df_raw.groupby('purchase_date').quantity.apply(sem).mul(1.96)

# plot
plt.figure(figsize=(16,10), dpi= 80)
plt.ylabel("# daily orders", fontsize=16)  
x = [d.date().strftime('%y-%m-%d') for d in df_mean.index]
plt.plot(x, df_mean, color="white", lw=2) 
plt.fill_between(x, df_mean - df_se, df_mean + df_se, color="#3f5d7d")  

# decorations
# lighten borders
plt.gca().spines["top"].set_alpha(0)
plt.gca().spines["bottom"].set_alpha(1)
plt.gca().spines["right"].set_alpha(0)
plt.gca().spines["left"].set_alpha(1)
plt.xticks(x[::6], [str(d) for d in x[::6]] , fontsize=12)
plt.title("daily order quantity of brazilian retail with error bands (95% confidence)", fontsize=20)

# axis limits
s, e = plt.gca().get_xlim()
plt.xlim(s, e-2)
plt.ylim(4, 10)

# draw horizontal tick lines  
for y in range(5, 10, 1):    
    plt.hlines(y, xmin=s, xmax=e, colors='black', alpha=0.5, linestyles="--", lw=0.5)

plt.show()

 

9、堆积面积图

堆积面积图直观地显示了多个时间序列的贡献程度,因此可以轻松地进行相互比较。

大佬整理的Python数据可视化时间序列案例,建议收藏(附代码)

 

 

# import data
df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/nightvisitors.csv')

# decide colors 
mycolors = ['tab:red', 'tab:blue', 'tab:green', 'tab:orange', 'tab:brown', 'tab:grey', 'tab:pink', 'tab:olive']      

# draw plot and annotate
fig, ax = plt.subplots(1,1,figsize=(16, 9), dpi= 80)
columns = df.columns[1:]
labs = columns.values.tolist()

# prepare data
x  = df['yearmon'].values.tolist()
y0 = df[columns[0]].values.tolist()
y1 = df[columns[1]].values.tolist()
y2 = df[columns[2]].values.tolist()
y3 = df[columns[3]].values.tolist()
y4 = df[columns[4]].values.tolist()
y5 = df[columns[5]].values.tolist()
y6 = df[columns[6]].values.tolist()
y7 = df[columns[7]].values.tolist()
y = np.vstack([y0, y2, y4, y6, y7, y5, y1, y3])

# plot for each column
labs = columns.values.tolist()
ax = plt.gca()
ax.stackplot(x, y, labels=labs, colors=mycolors, alpha=0.8)

# decorations
ax.set_title('night visitors in australian regions', fontsize=18)
ax.set(ylim=[0, 100000])
ax.legend(fontsize=10, ncol=4)
plt.xticks(x[::5], fontsize=10, horizontalalignment='center')
plt.yticks(np.arange(10000, 100000, 20000), fontsize=10)
plt.xlim(x[0], x[-1])

# lighten borders
plt.gca().spines["top"].set_alpha(0)
plt.gca().spines["bottom"].set_alpha(.3)
plt.gca().spines["right"].set_alpha(0)
plt.gca().spines["left"].set_alpha(.3)

plt.show()

 

10、区域图(未堆叠)

未堆积的面积图用于可视化两个或多个系列相对于彼此的进度(涨跌)。在下面的图表中,您可以清楚地看到随着失业时间的中位数增加,个人储蓄率如何下降。未堆积面积图很好地显示了这种现象。

 

大佬整理的Python数据可视化时间序列案例,建议收藏(附代码)
import matplotlib as mpl
import calmap

# import data
df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/yahoo.csv", parse_dates=['date'])
df.set_index('date', inplace=true)

# plot
plt.figure(figsize=(16,10), dpi= 80)
calmap.calendarplot(df['2014']['vix.close'], fig_kws={'figsize': (16,10)}, yearlabel_kws={'color':'black', 'fontsize':14}, subplot_kws={'title':'yahoo stock prices'})
plt.show()

11、日历热图

日历地图是与时间序列相比可视化基于时间的数据的替代方法,而不是首选方法。尽管可以在视觉上吸引人,但数值并不十分明显。但是,它可以有效地很好地描绘出极端值和假日效果。

大佬整理的Python数据可视化时间序列案例,建议收藏(附代码)

 

import matplotlib as mpl
import calmap

# import data
df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/yahoo.csv", parse_dates=['date'])
df.set_index('date', inplace=true)

# plot
plt.figure(figsize=(16,10), dpi= 80)
calmap.calendarplot(df['2014']['vix.close'], fig_kws={'figsize': (16,10)}, yearlabel_kws={'color':'black', 'fontsize':14}, subplot_kws={'title':'yahoo stock prices'})
plt.show()

 

12、季节性图

季节性图可用于比较上一个季节的同一天(年/月/周等)的时间序列执行情况。

大佬整理的Python数据可视化时间序列案例,建议收藏(附代码)

 

 

from dateutil.parser import parse 

# import data
df = pd.read_csv('https://github.com/selva86/datasets/raw/master/airpassengers.csv')

# prepare data
df['year'] = [parse(d).year for d in df.date]
df['month'] = [parse(d).strftime('%b') for d in df.date]
years = df['year'].unique()

# draw plot
mycolors = ['tab:red', 'tab:blue', 'tab:green', 'tab:orange', 'tab:brown', 'tab:grey', 'tab:pink', 'tab:olive', 'deeppink', 'steelblue', 'firebrick', 'mediumseagreen']      
plt.figure(figsize=(16,10), dpi= 80)

for i, y in enumerate(years):
    plt.plot('month', 'traffic', data=df.loc[df.year==y, :], color=mycolors[i], label=y)
    plt.text(df.loc[df.year==y, :].shape[0]-.9, df.loc[df.year==y, 'traffic'][-1:].values[0], y, fontsize=12, color=mycolors[i])

# decoration
plt.ylim(50,750)
plt.xlim(-0.3, 11)
plt.ylabel('$air traffic$')
plt.yticks(fontsize=12, alpha=.7)
plt.title("monthly seasonal plot: air passengers traffic (1949 - 1969)", fontsize=22)
plt.grid(axis='y', alpha=.3)

# remove borders
plt.gca().spines["top"].set_alpha(0.0)    
plt.gca().spines["bottom"].set_alpha(0.5)
plt.gca().spines["right"].set_alpha(0.0)    
plt.gca().spines["left"].set_alpha(0.5)   
# plt.legend(loc='upper right', ncol=2, fontsize=12)
plt.show()

 

不管你是零基础还是有基础都可以获取到自己相对应的学习礼包!包括python软件工具和2020最新入门到实战教程。加群695185429即可免费获取。