基于keras的残差网络
程序员文章站
2022-05-20 22:29:51
...
1 前言
理论上,网络层数越深,拟合效果越好。但是,层数加深也会导致梯度消失或梯度爆炸现象产生。当网络层数已经过深时,深层网络表现为“恒等映射”。实践表明,神经网络对残差的学习比对恒等关系的学习表现更好,因此,残差网络在深层模型中广泛应用。
本文以MNIST手写数字分类为例,为方便读者认识残差网络,网络中只有全连接层,没有卷积层。关于MNIST数据集的说明,见使用TensorFlow实现MNIST数据集分类
笔者工作空间如下:
2 实验
from tensorflow.examples.tutorials.mnist import input_data
from keras.models import Model
from keras.layers import add,Input,Dense,Activation,Layer
#载入数据
def read_data(path):
mnist=input_data.read_data_sets(path,one_hot=True)
train_x,train_y=mnist.train.images,mnist.train.labels,
valid_x,valid_y=mnist.validation.images,mnist.validation.labels,
test_x,test_y=mnist.test.images,mnist.test.labels
return train_x,train_y,valid_x,valid_y,test_x,test_y
#子类模型
class ResBlock(Layer):
def __init__(self,hidden_size1,hidden_size2):
super(ResBlock,self).__init__()
#初始化网络结构
self.dense1=Dense(hidden_size1,activation='relu') #第一隐层
self.dense2=Dense(hidden_size2) #第二隐层
self.dense_short=Dense(hidden_size2) #残差连接
self.act=Activation('relu') #**函数
self.hidden_size2=hidden_size2
def call(self,inputs): #回调顺序
x=self.dense1(inputs)
x=self.dense2(x)
if inputs.shape[1]==self.hidden_size2:
x=add([x,inputs])
else:
shortcut=self.dense_short(inputs)
x=add([x,shortcut])
x=self.act(x)
return x
def compute_output_shape(self, input_shape): #输出尺寸
return (input_shape[0],self.hidden_size2)
def ResNet(train_x,train_y,valid_x,valid_y,test_x,test_y):
inputs=Input(shape=(784,))
x=ResBlock(100,200)(inputs)
x=ResBlock(100,200)(x)
x=Dense(10,activation='softmax')(x)
model=Model(input=inputs,output=x)
#查看网络结构
model.summary()
#编译模型
model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])
#训练模型
model.fit(train_x,train_y,batch_size=500,nb_epoch=100,verbose=1,validation_data=(valid_x,valid_y))
#评估模型
pre=model.evaluate(test_x,test_y,batch_size=500,verbose=1)
print('test_loss:',pre[0],'- test_acc:',pre[1])
train_x,train_y,valid_x,valid_y,test_x,test_y=read_data('MNIST_data')
ResNet(train_x,train_y,valid_x,valid_y,test_x,test_y)
运行结果:
Epoch 97/100
55000/55000 [==============================] - 1s 12us/step - loss: 0.3196 - acc: 0.9040 - val_loss: 0.3258 - val_acc: 0.9014
Epoch 98/100
55000/55000 [==============================] - 1s 13us/step - loss: 0.3193 - acc: 0.9047 - val_loss: 0.3254 - val_acc: 0.9018
Epoch 99/100
55000/55000 [==============================] - 1s 13us/step - loss: 0.3188 - acc: 0.9043 - val_loss: 0.3262 - val_acc: 0.9018
Epoch 100/100
55000/55000 [==============================] - 1s 12us/step - loss: 0.3187 - acc: 0.9046 - val_loss: 0.3249 - val_acc: 0.9022
10000/10000 [==============================] - 0s 9us/step
test_loss: 0.3100854724645615 - test_acc: 0.9060000002384185
注意事项:
ResBlock 类中 compute_output_shape( ) 方法不能省去,否则会报错
ValueError: Dimensions must be equal, but are 200 and 784 for 'res_block_88/dense_303/MatMul' (op: 'MatMul') with input shapes: [?,200], [784,100].
下一篇: 遗传算法求解TSP问题