使用lstm中的stateful和return_sequence导致InvalidArgumentError

程序员文章站 2024-03-14 08:47:10

...

问题描述

训练一个模型，使用LSTM组为第一层，具体模型如下图。当使用LSTM的默认参数为 stateful = False，return_sequences = False时，能够正常训练。
使用lstm中的stateful和return_sequence导致InvalidArgumentError
然而当要返回每步的状态时，即LSTM的默认参数为 stateful = True，return_sequences = True时，模型和报错如下：

InvalidArgumentError Traceback (most recent call last)
in ()
21 history = model.fit(tfrecords_train_set,
22 validation_data = tfrecords_valid_set,
—> 23 epochs = epochs, verbose=2, callbacks = callbacks)
24
25 scores = model.evaluate(tfrecords_test_set, verbose=2)
…
…
…
InvalidArgumentError: [Derived] Incompatible shapes: [3,128] vs. [64,128]
[[{{node while_19/body/_1/add}}]]
[[model_14/lstm1/StatefulPartitionedCall]] [Op:__inference_distributed_function_118033]

Function call stack:
distributed_function -> distributed_function -> distributed_function

问题原因

看到 Incompatible shapes: [3,128] vs. [64,128]，64是我设的batch_size，128是输入数据的时间步time_step，为什么有个3？折腾了半天，我才发现是划分batch导致的，我使用的是batch()划分数据集，当只传入batch_size不设置其他参数时，不足一个batch_size大小的最后一个batch也会划为一个batch。代码如下：
使用lstm中的stateful和return_sequence导致InvalidArgumentError
这里就是最后3个数据被划为一个batch。在LSTM中，当要返回每个时间步的输出时，要指定batch大小，input初始化的input shape要修改，我这里使用的时 Keras functional API，input shape要使用batch_shape指定，如下：

inputs = tf.keras.Input(batch_shape=(batch_size, timesteps, input_dims), name='inputs')

当训练到最后一个batch时，输入shape指定是batch_size与真正的输入batch_size不一致导致出错。

解决办法

这里的解决办法是丢弃最后一个batch。
使用lstm中的stateful和return_sequence导致InvalidArgumentError