TensorFlow详解猫狗识别 (三)--训练过程
程序员文章站
2024-03-15 23:30:30
...
感悟
在这段时间中,同时测试了几个神经网络的模型(LeNet、AlexNet、VGG16...)
感受到了调节超参数的重要性,单单对于LeNet来说,得出,当:
batch_size = 32
lr = 0.0001
max_step = 6000~10000
时函数收敛比较快,训练步数介于6000~10000时,训练出来的效果比较好,之前训练了一晚上100000步,第二天过来看预测结果,满心欢喜看到train_acc达到了100%,但用来识别精度大概在75%,模型出现了严重的过拟合,训练8000步的时候,预测结果很理想,100张网图,基本没有错。
而当lr设太小的话,模型不收敛或收敛的太慢,lr太大的话模型会出现震荡。
下面直接给出训练代码,里面会有详细注释。
代码
import os
import numpy as np
import tensorflow as tf
import test
import model
import time
N_CLASSES = 2
IMG_W = 208
IMG_H = 208
BATCH_SIZE = 16
CAPACITY = 2000 #队列中元素个数
MAX_STEP = 8000
learning_rate = 0.0001 #小于0.001
print("I'm OK")
train_dir = 'E:\\Pycharm\\tf-01\\Bigwork\\train\\' # 训练图片文件夹
logs_train_dir = 'E:\\Pycharm\\tf-01\\Bigwork\\savenet02\\' # 保存训练结果文件夹
train, train_label = test.get_files(train_dir)
train_batch, train_label_batch = test.get_batch(train,
train_label,
IMG_W,
IMG_H,
BATCH_SIZE,
CAPACITY)
#训练操作定义
sess = tf.Session()
train_logits = model.inference(train_batch, BATCH_SIZE, N_CLASSES)
train_loss = model.losses(train_logits, train_label_batch)
train_op = model.trainning(train_loss, learning_rate)
train_acc = model.evaluation(train_logits, train_label_batch)
#train_label_batch = tf.one_hot(train_label_batch,2,1,0)
#测试操作定义
summary_op = tf.summary.merge_all()
#产生一个writer来写log文件
train_writer = tf.summary.FileWriter(logs_train_dir,sess.graph)
saver = tf.train.Saver()
sess.run(tf.global_variables_initializer())
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess,coord = coord)#加入队列,很重要
tra_loss = .0
tra_acc = .0
# val_loss = .0
# val_acc = .0
try:
start = time.clock()#计算每一个step所花的时间
for step in np.arange(MAX_STEP):
if coord.should_stop():
break
_,tra_loss_,tra_acc_ = sess.run([train_op,train_loss,train_acc])
# val_loss_, val_acc_ = sess.run([test_loss, test_acc])
#下面这一段为我为了打印神经网络最后一层变化写的,可以不要
'''
train,label = sess.run([train_logits,train_label_batch])
#print(train)
L = []
for i in train:
max_ = np.argmax(i)
L.append(max_)
print(L)
print(label)
'''
tra_loss = tra_loss+tra_loss_
tra_acc = tra_acc+tra_acc_
# val_loss = val_loss+val_loss_
# val_acc = val_acc+val_acc_
if (step+1) % 50 == 0 and step!=0:
end = time.clock()
print('Step %d, train loss = %.2f, train accuracy = %.2f%%' % (step+1, tra_loss/50, tra_acc * 100.0/50))
#print('Step %d, val loss = %.2f, val accuracy = %.2f%%' % (step, val_loss/50,val_acc*100.0/50))
print(str(end-start))
tra_loss = .0
tra_acc = .0
summary_str = sess.run(summary_op)
train_writer.add_summary(summary_str, step)
start = time.clock()
# 每隔2000步,保存一次训练好的模型
if step%2000==0 or step == MAX_STEP-1:
checkpoint_path = os.path.join(logs_train_dir, 'model.ckpt')
saver.save(sess, checkpoint_path, global_step=step)
except tf.errors.OutOfRangeError:
print('Done training -- epoch limit reached')
finally:
coord.request_stop()
coord.join(threads)
sess.close()
开始训练
下图我是每间隔10个step打印一次结果,可以看到,训练到2000多步的时候,精度已经可以达到了84%。
下一篇:预测