欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  IT编程

分批读取训练数据进行训练

程序员文章站 2024-01-30 20:05:10
训练集数据量过大受设备内存影响不能将全部数据直接放到网络中进行训练,需要分批读取训练数据。train_x为训练集地址,train_y为训练集标签,val_X为验证集数据,val_y为验证集标签。分批读取函数如下:def dataset_split(images, labels, batch_size): while 1: i = 0 n = math.ceil(len(images)/batch_size) print(n)...

训练集数据量过大受设备内存影响不能将全部数据直接放到网络中进行训练,需要分批读取训练数据。

train_x为训练集地址,train_y为训练集标签,val_X为验证集数据,val_y为验证集标签。

分批读取函数如下:


def dataset_split(images, labels, batch_size):
    while 1:
        i = 0
        n = math.ceil(len(images)/batch_size)
        print(n)
        for j in range(n):
            if j != n-1:
                x = images[i : i + batch_size]
                y = labels[i : i + batch_size]
                i = i + batch_size
                X = []
                for m in range(len(x)):
                    a = cv2.imread(x[m])
                    #print(type(a))
                    a = a.tolist()
                    #print(type(a))
                    X.append(a)
                X = np.array(X)
                yield X, y
                
            if j == n-1:
                x = images[len(images)-batch_size: ]
                y = labels[len(labels)-batch_size: ]
                X = []
                for m in range(len(x)):
                    a = cv2.imread(x[m])
                    a = a.tolist()
                    X.append(a)
                X = np.array(X)
                yield X, y


model.fit_generator(dataset_split(train_x, train_y, batch_size),
                                        validation_data=(val_X, val_y), steps_per_epoch= 
                                        (len(train_x) // batch_size), epochs=EPOCHS, 
                                        verbose=2, callbacks=[csv_logger, checkpointer]) 

 

本文地址:https://blog.csdn.net/qq_21466543/article/details/107150645