随机梯度下降SGD算法原理和实现

程序员文章站 2022-03-04 20:31:28

...

backpropagation

backpropagation解决的核心问题损失函数c与w,b求偏导，(c为cost(w,b))

整体来说，分两步

1.z=w*a’+b
2.a=sigmoid(z)
其中，a’表示上一层的输出值，a表示当前该层的输出值
1，输入x，正向的更新一遍所有的a值就都有了，
2，计算输出层的delta=(y-a)点乘sigmoid(z)函数对z的偏导数
3，计算输出层之前层的误差delta，该delta即为损失函数对b的偏导数，
4，然后根据公式4，求出对w的偏导数
公式推导详解

import numpy as np
import random

class Network(object):
    def __init__(self, sizes):
        self.number_layers = len(sizes)
        self.sizes = sizes
        self.biases = [np.random.randn(y, 1) for y in sizes[1:]]
        self.weights = [np.random.randn(y, x) for x, y in zip(sizes[:-1], sizes[1:])]
    def feedforward(self,a):
        for b, w in zip(self.biases, self.weights):
            a = sigmoid(np.dot(w, a) + b)
        return a
    def evaluate(self,test_data):
        test_results = [(np.argmax(self.feedforward(x)), y)
                        for (x, y) in test_data]
        return sum(int(x == y) for (x, y) in test_results)
    def derivate(self,output,y):
        return (output-y)
    def backprop(self,x,y):
        nabla_b = [np.zeros(b.shape) for b in self.biases]
        nabla_w = [np.zeros(w.shape) for w in self.weights]
        activation = x
        activations = [x]
        zs = []
        for b, w in zip(self.biases, self.weights):
            z = np.dot(w, activation)+b
            zs.append(z)
            activation = sigmoid(z)
            activations.append(activation)
        delta = self.derivate(activations[-1], y) * sigmoid_prime(zs[-1])
        nabla_b[-1] = delta
        nabla_w[-1] = np.dot(delta, activations[-2].transpose())
        for i in range(2,self.number_layers):
            z = zs[-i]
            ps = sigmoid_prime(z)
            delta = np.dot(self.weights[-i+1].transpose(), delta) * ps
            nabla_b[-i] = delta
            nabla_w[-i] = np.dot(delta, activations[-i-1].transpose())
        return nabla_b, nabla_w
    def update_mini_batch(self, mini_batch, eta):
        nabla_w = [np.zeros(w.shape) for w in self.weights]
        nabla_b = [np.zeros(b.shape) for b in self.biases]
        for x, y in mini_batch:
            delta_nabla_b, delta_nabla_w = self.backprop(x, y)
            nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
            nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
        self.weights = [w - (eta/len(mini_batch) * nw) for w, nw in zip(self.weights, nabla_w)]
        self.biases = [b - (eta/len(mini_batch) * nb) for b, nb in zip(self.biases, nabla_b)]
    def SGD(self, training_data, epochs, mini_batch_size, eta, test_data=None):
        if test_data:n_test = len(test_data)
        n = len(training_data)
        for j in range(epochs):
            random.shuffle(training_data)
            mini_batches = [
                training_data[k:k+mini_batch_size]
                for k in range(0, n, mini_batch_size)
            ]
            for mini_batch in mini_batches:
                self.update_mini_batch(mini_batch, eta)
            if test_data:
                print('Epoch{0} : {1}/{2} '.format(j, self.evaluate(test_data), n_test))
            else:
                print('Epoch complete!'.format(j))


def sigmoid(z):
        return (1.0 / (1.0+np.exp(-z)))
def sigmoid_prime(z):
        return sigmoid(z) * (1-sigmoid(z))

随机梯度下降SGD算法原理和实现

backpropagation

整体来说，分两步

python实现随机梯度下降（SGD）

纯Python和PyTorch对比实现SGD, Momentum, RMSprop, Adam梯度下降算法

矩阵分解算法的求解随机梯度下降SGD和交替最小二乘ALS

梯度算法之批量梯度下降，随机梯度下降和小批量梯度下降

Python 随机梯度下降 SGD 代码实现笔记

python实现随机梯度下降（SGD）

随机梯度下降SGD算法原理和实现

随机梯度下降SGD算法原理和实现

backpropagation

整体来说，分两步

python实现随机梯度下降（SGD）

纯Python和PyTorch对比实现SGD, Momentum, RMSprop, Adam梯度下降算法

矩阵分解算法的求解 随机梯度下降SGD和交替最小二乘ALS

梯度算法之批量梯度下降，随机梯度下降和小批量梯度下降

Python 随机梯度下降 SGD 代码实现 笔记

python实现随机梯度下降（SGD）

随机梯度下降SGD算法原理和实现

矩阵分解算法的求解随机梯度下降SGD和交替最小二乘ALS

Python 随机梯度下降 SGD 代码实现笔记