基于python的BP神经网络算法对mnist数据集的识别--批量处理版

程序员文章站 2022-04-15 23:09:10

基于python的BP神经网络算法对mnist数据集的识别目录：1. mnist数据集1.1 mnist数据集是什么1.2 mnist数据集的读取**2. 神经网络2.1 批处理数据2.2 前向传播2.2.1 sigmoid和softmax函数2.2.2 损失函数2.2.3 精度2.3 反向传播2.4 构建神经网络3.训练神经网络1.mnist数据集在使用机器学习以及深度学习做神经网络算法时，常用的示例是使用mnist数据集的train_img 和 test_img 作为神经网...

基于python的BP神经网络算法对mnist数据集的识别
目录：

1. mnist数据集

1.1 mnist数据集是什么

1.2 mnist数据集的读取

2. 神经网络

2.1 批处理数据

2.2 前向传播

2.2.1 sigmoid和softmax函数

2.2.2 损失函数

2.2.3 识别精度

2.3 反向传播

2.4 构建神经网络

3.训练神经网络

1.mnist数据集

在使用机器学习以及深度学习做神经网络算法时，常用的示例是使用mnist数据集的train_img 和 test_img 作为神经网络的输入，以mnist数据集的 train_label 和 test_label
本节简要的介绍mnist数据集和mnis数据集的导入以及处理

1.1 mnist数据集是什么
mnist是一个包含各种手写数字图片的数据集：其中有60000个训练数据和10000个测试时局，即60000个 train_img 和与之对应的 train_label，10000个 test_img 和与之对应的test_label。
基于python的BP神经网络算法对mnist数据集的识别--批量处理版
其中的 train_img 和 test_img 就是这种图片的形式，train_img 是为了训练神经网络算法的训练数据，test_img 是为了测试神经网络算法的测试数据，每一张图片为2828，将图片转换为2828=784个像素点，每个像素点的值为0到255，像素点值的大小代表灰度，从而构成一个1784的矩阵，作为神经网络的输入，而神经网络的输出形式为110的矩阵，个：eg：[0.01，0.01，0.01，0.04，0.8，0.01，0.1，0.01，0.01，0.01]，矩阵里的数字代表神经网络预测值的概率，比如0.8代表第五个数的预测值概率。

其中 train_label 和 test_label 是对应训练数据和测试数据的标签，可以理解为一个1*10的矩阵，用one-hot-vectors（只有正确解表示为1）表示，one_hot_label为True的情况下，标签作为one-hot数组返回，one-hot数组例：[0，0，0，0，1，0，0，0，0，0]，即矩阵里的数字1代表第五个数为True，也就是这个标签代表数字5。

1.2 mnist数据集的读取
load_mnist(normalize=True, flatten=True, one_hot_label=False):中，
normalize : 是否将图像的像素值正规化为0.0~1.0（将像素值正规化有利于提高精度）flatten : 是否将图像展开为一维数组
one_hot_label:是否采用one-hot表示。
源码在 https://gitee.com/ldy1118/netural-network 中的mnist.py中，可直接调用（需要提前下载mnist数据集，mnist官网下载地址：http://yann.lecun.com/exdb/mnist/，四个红色文件，并将文件放在mnist同级目录下）：

from mnist import load_mnist

(x_train, t_train), (x_test, t_test) = load_mnist(normalize=True, flatten=True, one_hot_label=True)
print(x_train.shape, t_train.shape, x_test.shape, t_test.shape)

输出结果为：(60000, 784) (60000, 10) (10000, 784) (10000, 10)

2. 神经网络

2.1批处理数据

现在已经获得数据集，然后搭建一个两层（两个权重矩阵，一个隐藏层）的神经网络，其中输入节点和输出节点的个数是确定的，分别为 784 和 10。而隐藏层节点的个数还未确定，并没有明确要求隐藏层的节点个数，所以在这里取50个。现在神经网络的结构已经确定了，再看一下里面是怎么样的，这里画出了对一个数据的运算过程：
基于python的BP神经网络算法对mnist数据集的识别--批量处理版

数学公式推导为：
基于python的BP神经网络算法对mnist数据集的识别--批量处理版
在实际过程中，如果每次对一个数据训练n次神经网络，一共60000个数据，这个运算可想而知还是很庞大的，所以在这里介绍一种mini-batch的方法批量选取数据：

from mnist import load_mnist
  #读取数据：
(x_train, t_train), (x_test, t_test) = load_mnist(normalize=True, flatten=True, one_hot_label=True)
epoch = 20000  #对一批数据的迭代次数
for i in range(epoch):
    batch_mask = np.random.choice(train_size, batch_size)  # 从0到60000 随机选100个数
    x_batch = x_train[batch_mask]  # 索引x_train中随机选出的行数，构成一批数据
    y_batch = net.predict(x_batch)  # 计算这批数据的预测值
    t_batch = t_train[batch_mask]  # 同x_batch

2.2 前向传播
前向传播时，我们可以构造一个函数，输入数据，输出预测值

def predict(x,t):
	    a1 = np.dot(x, w1) + b1
        z1 = sigmoid(a1)
        a2 = np.dot(z1, w2) + b2
        y = softmax(a2)

2.2.1 sigmoid和softmax函数

在神经网络中，输入数据经过前向传播得到预测值
需要用到激活函数得出各节点的输出值，这里用到sigmoid和softmax函数

基于python的BP神经网络算法对mnist数据集的识别--批量处理版
其中要注意y=softmax（）函数并不是一个自变量和一个因变量，每个因变量都与各个自变量是有关系的。
下面会用到sigmoid函数的导数，sigmoid，sigmoid的导数和softmax函数的代码如下：

import numpy as np

def sigmoid(x):
    return 1 / (1 + np.exp(-x))  
    
def sigmoid_grad(x):
    return (1.0 - sigmoid(x)) * sigmoid(x)

def softmax(x):
    if x.ndim == 2:
        x = x.T
        x = x - np.max(x, axis=0)  #
        y = np.exp(x) / np.sum(np.exp(x), axis=0)
        return y.T 

    x = x - np.max(x) 
    return np.exp(x) / np.sum(np.exp(x))

2.2.2 损失函数
到上一步，我们已经求出神经网络对一组数据的预测值，是一个110的矩阵，但是如何衡量神经网络算法的精度呢？这就引入了损失函数，常用损失函数有均方误差和交叉熵误差

其中，Yk表示的是第k个节点的预测值，Tk表示标签中第k个节点的one-hot值，举前面的eg：（手写数字5的图片预测值和5的标签）
Yk=[0.01，0.01，0.01，0.04，0.8，0.01，0.1，0.01，0.01，0.01]
Tk=[0, 0, 0, 0, 1, 0, 0, 0, 0, 0]
值得一提的是，在交叉熵误差函数中，Tk的值只有一个1，其余为0，所以对于这个数据的交叉熵误差就为 E = -1（log0.8）。
在这里选用交叉熵误差作为损失函数，代码实现如下：

def loss(y, t):
    # 监督数据是one-hot-vector的情况下，转换为正确解标签的索引
    if t.size == y.size:
        t = t.argmax(axis=1)  #找出一行中最大数值的索引号
             
    batch_size = y.shape[0]  # 批的尺寸，y.shape[0]即y的行数
    s = y[np.arange(batch_size), t]  # 找出y中对应于标签t中正确解位置的预测值
      # s+1e-7 防止取到无穷大，除以batch_size是因为np.sum求了和
    return -np.sum(np.log(s + 1e-7)) / batch_size  # s+1e-7 防止取到无穷大

2.2.3 识别精度

废话不多说，直接上代码：

    def accuracy(x,t):
        y = predict(x)  # y为100*10的矩阵，因为前面选取了一批数据（包含100个数据）
        p = np.argmax(y, axis=1)  # 找出y中最大值的索引号，构成1*100的矩阵
        q = np.argmax(t, axis=1)  # 找出t中最大值的索引号，构成1*100的矩阵
        acc = np.sum(p == q) / len(y)  # 按布尔类型求和，在除以数据个数
        return acc

整个前向传播过程到此就结束了，梳理一下思路：目的是求一个能使输入数据尽可能得出与标签相等的预测值的w1, b1, w2, b2，衡量神经网络精度的是损失函数，也就是说，我们要对损失函数求w1, b1, w2, b2 的偏导数构成梯度，物理意义为：w1, b1, w2, b2 的变化在多大程度上影响损失函数的值，也就是将各偏导数加在第一次迭代的w1, b1, w2, b2 上进行更新（但不是单纯的相加，后面会介绍），第二次迭代将使用更新后的w1, b1, w2, b2 ，这一步称为反向传播，一个前向传播再加一个反向传播构成一次迭代，下面将介绍反向传播中随机梯度下降的方法。

2.3反向传播

计算梯度
在求偏导数的过程中要用到链式法则，我们来看一下在预测值Yk和w1, b1, w2, b2之间的变量：
基于python的BP神经网络算法对mnist数据集的识别--批量处理版
loss 对 w1, b1, w2, b2的偏导数：

这里需要注意矩阵的偏导数，求完要检查矩阵的形状，其次上述公式里主义区分矩阵的点乘和*乘。

2.4 构建神经网络
前面我们定义了预测值predict, 损失函数loss, 识别精度accuracy, 梯度grad，下面构建一个神经网络的类，把这些方法添加到神经网络的类中：

import numpy as np
from functions import sigmoid, sigmoid_grad, softmax, loss

class TwoLayerNet:

    def __init__(self, input_size, hidden_size, output_size, weight_init_std):
        # 初始化权重
        self.dict = {}  # 创建一个字典用于存储w1, b1, w2, b2
        self.dict['w1'] = weight_init_std * np.random.randn(input_size, hidden_size)  
        self.dict['b1'] = np.zeros(hidden_size)  
        self.dict['w2'] = weight_init_std * np.random.randn(hidden_size, output_size) 
        self.dict['b2'] = np.zeros(output_size) 

    def predict(self, x):
        w1, w2 = self.dict['w1'], self.dict['w2']
        b1, b2 = self.dict['b1'], self.dict['b2']

        a1 = np.dot(x, w1) + b1
        z1 = sigmoid(a1)
        a2 = np.dot(z1, w2) + b2
        y = softmax(a2)

        return y
        
	def loss(y, t):
    	if t.size == y.size:
        	t = t.argmax(axis=1) 
             
    	batch_size = y.shape[0] 
    	
    	return -np.sum(np.log(y[np.arange(batch_size), t] + 1e-7)) / batch_size  

    def gradient(self, x, t):
        w1, w2 = self.dict['w1'], self.dict['w2']
        b1, b2 = self.dict['b1'], self.dict['b2']
        grads = {}

        a1 = np.dot(x, w1) + b1
        z1 = sigmoid(a1)
        a2 = np.dot(z1, w2) + b2
        y = softmax(a2)

        num = x.shape[0]
        dy = (y - t) / num
        grads['w2'] = np.dot(z1.T, dy)
        grads['b2'] = np.sum(dy, axis=0)

        da1 = np.dot(dy, w2.T)
        dz1 = sigmoid_grad(a1) * da1
        grads['w1'] = np.dot(x.T, dz1)
        grads['b1'] = np.sum(dz1, axis=0)

        return grads

    def accuracy(self,x,t):
        y = self.predict(x)
        p = np.argmax(y, axis=1)
        q = np.argmax(t, axis=1)
        acc = np.sum(p == q) / len(y)
        return acc

3.训练神经网络

现在，神经网络已经是一个带有计算预测值，损失值，精度和随机梯度下降法的网络了，我们只需要指定迭代就ok了，为了验证输入每一批训练后神经网络的训练情况，在对每一批数据训练后加入了对测试数据的精度，实现如下：

import numpy as np
import matplotlib.pyplot as plt
from TwoLayerNet import TwoLayerNet
from mnist import load_mnist

(x_train, t_train), (x_test, t_test) = load_mnist(normalize=True, one_hot_label=True)
net = TwoLayerNet(input_size=784, hidden_size=50, output_size=10, weight_init_std=0.01)

epoch = 20000
batch_size = 100
lr = 0.1

train_size = x_train.shape[0]  # 60000
iter_per_epoch = max(train_size / batch_size, 1)  # 600

train_loss_list = []
train_acc_list = []
test_acc_list = []

for i in range(epoch):
    batch_mask = np.random.choice(train_size, batch_size)  # 从0到60000 随机选100个数
    x_batch = x_train[batch_mask]
    y_batch = net.predict(x_batch)
    t_batch = t_train[batch_mask]
    grad = net.gradient(x_batch, t_batch)

    for key in ('w1', 'b1', 'w2', 'b2'):
        net.dict[key] -= lr * grad[key]
    loss = net.loss(y_batch, t_batch)
    train_loss_list.append(loss)

    # 对每批数据记录一次精度和当前的损失值
    if i % iter_per_epoch == 0:
        train_acc = net.accuracy(x_train, t_train)
        test_acc = net.accuracy(x_test, t_test)
        train_acc_list.append(train_acc)
        test_acc_list.append(test_acc)
        print(
            '第' + str(i + 1) + '次迭代''train_acc, test_acc, loss :| ' + str(train_acc) + ", " + str(test_acc) + ',' + str(
                loss))

# 绘制 精度 = f（迭代批数）的图像
markers = {'train': 'o', 'test': 's'}
x = np.arange(len(train_acc_list))
plt.plot(x, train_acc_list, label='train acc')
plt.plot(x, test_acc_list, label='test acc', linestyle='--')
plt.xlabel("epochs")
plt.ylabel("accuracy")
plt.ylim(0, 1.0)
plt.legend(loc='lower right')
plt.show()