Trying to understand the validation_steps
parameter of tf.keras.Model.fit.
Total number of steps (batches of samples) to draw before stopping when performing validation at the end of every epoch.
For instance, TFDS MNIST dataset has 60,000
train and 10,000
test data records. Trying to consume all the records during num_epochs=2
epochs with batch_size=8
using generators as the data sources to the model.
(train, test), info = tfds.load(
'mnist',
split=['train', 'test'],
shuffle_files=True,
as_supervised=True,
with_info=True,
)
x_generator = train.batch(batch_size).as_numpy_iterator()
v_generator = test.batch(batch_size).as_numpy_iterator() # using 'test' for validation here
The training data can afford 3750=(60000 / batch_size=8 / epochs=2)
batches, and the test data can afford 625=(10000 / batch_size=8 / epochs=2)
batches.
def f(image, label):
return 1
num_total_train_records = len(list( # 60000
train.map(f)
))
num_total_test_records = len(list( # 10000
test.map(f)
))
print(num_total_train_records, num_total_test_records)
-----
60000 10000
num_epochs = 2
batch_size = 8
num_x_batches_per_epoch = int(np.floor(num_total_train_records / batch_size / num_epochs))
num_v_batches_per_epoch = int(np.floor(num_total_test_records / batch_size / num_epochs))
print(num_x_batches_per_epoch, num_v_batches_per_epoch)
# ---
# show 3750 625
However, setting tf.keras.Model.fit(validation_steps=625)
causes the error Your input ran out of data... Make sure that your dataset or generator can generate at least `steps_per_epoch * epochs` batches (in this case, 625 batches)
.
model.fit(
x=x_generator ,
epochs=num_epochs,
batch_size=batch_size, # not using batch_size arg makes no difference
steps_per_epoch=num_x_batches_per_epoch,
validation_data=v_generator,
validation_steps=num_v_batches_per_epoch,
validation_batch_size=batch_size
)
Your input ran out of data; interrupting training.
Make sure that your dataset or generator can generate at least `steps_per_epoch * epochs` batches
(in this case, 625 batches). You may need to use the repeat() function when building your dataset.
2023-11-21 17:39:33.226528: I tensorflow/core/framework/local_rendezvous.cc:421] Local rendezvous recv item cancelled. Key hash: 17391114698345974101
2023-11-21 17:39:33.226580: I tensorflow/core/framework/local_rendezvous.cc:421] Local rendezvous recv item cancelled. Key hash: 8226056677969075330
WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least `steps_per_epoch * epochs` batches (in this case, 625 batches). You may need to use the repeat() function when building your dataset.
Code
import numpy as np
import tensorflow as tf
from tensorflow import keras
import tensorflow_datasets as tfds
(train, test), info = tfds.load(
'mnist',
split=['train', 'test'],
shuffle_files=True,
as_supervised=True,
with_info=True,
)
def f(image, label):
return 1
num_total_train_records = len(list(
train.map(f)
))
num_total_test_records = len(list(
test.map(f)
))
print(num_total_train_records, num_total_test_records)
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(10)
])
model.compile(
optimizer=tf.keras.optimizers.Adam(0.001),
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=[tf.keras.metrics.SparseCategoricalAccuracy()],
)
num_epochs = 2
batch_size = 8
num_x_batches_per_epoch = int(np.floor(num_total_train_records / batch_size / num_epochs))
num_v_batches_per_epoch = int(np.floor(num_total_test_records / batch_size / num_epochs))
print(num_x_batches_per_epoch, num_v_batches_per_epoch)
# ---
# will show 3750 625
x_generator = train.batch(batch_size).as_numpy_iterator()
v_generator = test.batch(batch_size).as_numpy_iterator()
model.fit(
x=x_generator ,
epochs=num_epochs,
batch_size=batch_size,
steps_per_epoch=num_x_batches_per_epoch,
validation_data=v_generator,
validation_steps=num_v_batches_per_epoch,
validation_batch_size=batch_size
)
By minus 1, it works.
num_v_batches_per_epoch = int(np.floor(num_total_test_records / batch_size / num_epochs)) -1 # Cuase ran out of data without -1
Please help understand this behavior. Also the document says Only relevant if validation_data is provided and is a tf.data dataset.
but obviously it is not only for tf.data.Dataset
.
Environment
tensorflow 2.14.1
Python 3.10.12
Ubuntu 22.04 LTS
2
Answers
First of all you are computing the steps wrong, you don’t divide by the number of epochs at all just the data size to the batch size, because the steps are the number of fetched batches per each epoch:
Second you are using a Tensorflow dataset objects
so not a custom generator, so you don’t specify the steps for training or validation.
Check this tutorial:
https://www.tensorflow.org/datasets/keras_example
Specifically for
tfds
datasets, you can get the number of total observations withwith_info=True
like you did.Then, to create the correct
steps_per_epoch
, you can use the floor division operator to get the correct number of batches:Complete example: