tensorflow2.0入门 数据标准化 学习笔记
程序员文章站
2022-07-16 17:11:30
...
深度学习中图片数据一般需要归一化,或者标准化。
应用sklearn中数据标准化方法可以简化处理过程。
import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
import sklearn
import pandas as pd
import tensorflow as tf
from tensorflow import keras
print(tf.__version__)
2.0.0
fashion_mnist = keras.datasets.fashion_mnist
(x_train_all,y_train_all),(x_test,y_test) = fashion_mnist.load_data()
x_valid,x_train = x_train_all[:5000],x_train_all[5000:]
y_valid,y_train = y_train_all[:5000],y_train_all[5000:]
print(x_valid.shape,y_valid.shape)
print(x_train.shape,y_train.shape)
print(x_test.shape,y_test.shape)
(5000, 28, 28) (5000,)
(55000, 28, 28) (55000,)
(10000, 28, 28) (10000,)
print(np.max(x_train),np.min(x_train))
255 0
归一化(Normalization),是为了将数据映射到(0,1)之间
标准化(Standardization),消除分布产生的度量偏差,服从标准正态分布N(0,1)。
#:x = (x-mu)/std 均值为0,方差为1
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
#x_train: [None,28,28] -> [None,784] -> [None,28,28],reshape(-1,1),自动计算行数,1列
#fit_transform要求2维输入,
#fit得到训练集的均值方差并记录,以后还要用到验证集和测试集。
#训练数据是0-255的整数,标准化过程中有除法,需转化为float32类型
x_train_scaled = scaler.fit_transform(
x_train.astype(np.float32).reshape(-1,1)).reshape(-1,28,28)
x_valid_scaled = scaler.transform(
x_valid.astype(np.float32).reshape(-1,1)).reshape(-1,28,28)
x_test_scaled = scaler.transform(
x_test.astype(np.float32).reshape(-1,1)).reshape(-1,28,28)
print(np.max(x_train_scaled),np.min(x_train_scaled))
2.0231433 -0.8105136
#tf.keras.Sequential()
model = keras.models.Sequential()
model.add(keras.layers.Flatten(input_shape=[28,28])) #输入28*28的图像,展平成28*28的1维向量
model.add(keras.layers.Dense(300,activation="relu")) #全连接层,300个神经元,**函数rule
model.add(keras.layers.Dense(100,activation="relu"))
model.add(keras.layers.Dense(10,activation="softmax"))#输出层为全连接层,10类,**函数softmax,获得每一类的概率
model.compile(loss="sparse_categorical_crossentropy",
optimizer = "sgd",
metrics = ["accuracy"])
model.layers
model.summary() #模型的概况
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
flatten (Flatten) (None, 784) 0
_________________________________________________________________
dense (Dense) (None, 300) 235500
_________________________________________________________________
dense_1 (Dense) (None, 100) 30100
_________________________________________________________________
dense_2 (Dense) (None, 10) 1010
=================================================================
Total params: 266,610
Trainable params: 266,610
Non-trainable params: 0
_________________________________________________________________
history = model.fit(x_train_scaled,y_train,epochs=10,
validation_data=(x_valid_scaled,y_valid))
#epochs:训练次数
Train on 55000 samples, validate on 5000 samples
Epoch 1/10
55000/55000 [==============================] - 6s 106us/sample - loss: 0.5310 - accuracy: 0.8142 - val_loss: 0.3946 - val_accuracy: 0.8606
Epoch 2/10
55000/55000 [==============================] - 5s 84us/sample - loss: 0.3895 - accuracy: 0.8612 - val_loss: 0.3797 - val_accuracy: 0.8628
Epoch 3/10
55000/55000 [==============================] - 5s 90us/sample - loss: 0.3516 - accuracy: 0.8735 - val_loss: 0.3500 - val_accuracy: 0.8778
Epoch 4/10
55000/55000 [==============================] - 5s 88us/sample - loss: 0.3267 - accuracy: 0.8832 - val_loss: 0.3321 - val_accuracy: 0.8786
Epoch 5/10
55000/55000 [==============================] - 5s 84us/sample - loss: 0.3088 - accuracy: 0.8886 - val_loss: 0.3473 - val_accuracy: 0.8698
Epoch 6/10
55000/55000 [==============================] - 5s 90us/sample - loss: 0.2933 - accuracy: 0.8941 - val_loss: 0.3127 - val_accuracy: 0.8862
Epoch 7/10
55000/55000 [==============================] - 5s 89us/sample - loss: 0.2795 - accuracy: 0.8994 - val_loss: 0.3015 - val_accuracy: 0.8894
Epoch 8/10
55000/55000 [==============================] - 5s 82us/sample - loss: 0.2656 - accuracy: 0.9041 - val_loss: 0.3118 - val_accuracy: 0.8864
Epoch 9/10
55000/55000 [==============================] - 5s 93us/sample - loss: 0.2561 - accuracy: 0.9072 - val_loss: 0.3165 - val_accuracy: 0.8878
Epoch 10/10
55000/55000 [==============================] - 5s 86us/sample - loss: 0.2463 - accuracy: 0.9109 - val_loss: 0.2931 - val_accuracy: 0.8936
def plot_learning_curve(history):
pd.DataFrame(history.history).plot(figsize=(8,5))
plt.grid(True)
plt.gca().set_ylim(0,1)
plt.show()
#DateFrame 数据类型的作图
plot_learning_curve(history)
#在测试集评估
y = model.evaluate(x_test_scaled,y_test)
0s 44us/sample - loss: 0.2366 - accuracy: 0.8833
上一篇: Java编辑版本问题解决
下一篇: Linux之安装jenkins