skip to Main Content

Trying to understand the validation_steps parameter of tf.keras.Model.fit.

Total number of steps (batches of samples) to draw before stopping when performing validation at the end of every epoch.

For instance, TFDS MNIST dataset has 60,000 train and 10,000 test data records. Trying to consume all the records during num_epochs=2 epochs with batch_size=8 using generators as the data sources to the model.

(train, test), info = tfds.load(
    'mnist',
    split=['train', 'test'],
    shuffle_files=True,
    as_supervised=True,
    with_info=True,
)

x_generator = train.batch(batch_size).as_numpy_iterator()
v_generator = test.batch(batch_size).as_numpy_iterator()   # using 'test' for validation here

The training data can afford 3750=(60000 / batch_size=8 / epochs=2) batches, and the test data can afford 625=(10000 / batch_size=8 / epochs=2) batches.

def f(image, label):
    return 1

num_total_train_records = len(list(        # 60000
    train.map(f)
))
num_total_test_records = len(list(         # 10000
    test.map(f)
))
print(num_total_train_records, num_total_test_records)
-----
60000 10000
num_epochs = 2
batch_size = 8

num_x_batches_per_epoch = int(np.floor(num_total_train_records / batch_size / num_epochs))
num_v_batches_per_epoch = int(np.floor(num_total_test_records / batch_size / num_epochs)) 
print(num_x_batches_per_epoch, num_v_batches_per_epoch)
# ---
# show 3750 625

However, setting tf.keras.Model.fit(validation_steps=625) causes the error Your input ran out of data... Make sure that your dataset or generator can generate at least `steps_per_epoch * epochs` batches (in this case, 625 batches).

model.fit(
    x=x_generator ,
    epochs=num_epochs,
    batch_size=batch_size,    # not using batch_size arg makes no difference
    steps_per_epoch=num_x_batches_per_epoch,
    validation_data=v_generator,
    validation_steps=num_v_batches_per_epoch,
    validation_batch_size=batch_size
)
Your input ran out of data; interrupting training. 
Make sure that your dataset or generator can generate at least `steps_per_epoch * epochs` batches 
(in this case, 625 batches). You may need to use the repeat() function when building your dataset.

2023-11-21 17:39:33.226528: I tensorflow/core/framework/local_rendezvous.cc:421] Local rendezvous recv item cancelled. Key hash: 17391114698345974101
2023-11-21 17:39:33.226580: I tensorflow/core/framework/local_rendezvous.cc:421] Local rendezvous recv item cancelled. Key hash: 8226056677969075330
WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least `steps_per_epoch * epochs` batches (in this case, 625 batches). You may need to use the repeat() function when building your dataset.

Code

import numpy as np
import tensorflow as tf
from tensorflow import keras
import tensorflow_datasets as tfds


(train, test), info = tfds.load(
    'mnist',
    split=['train', 'test'],
    shuffle_files=True,
    as_supervised=True,
    with_info=True,
)


def f(image, label):
    return 1

num_total_train_records = len(list(
    train.map(f)
))
num_total_test_records = len(list(
    test.map(f)
))
print(num_total_train_records, num_total_test_records)

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dense(10)
])
model.compile(
    optimizer=tf.keras.optimizers.Adam(0.001),
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=[tf.keras.metrics.SparseCategoricalAccuracy()],
)

num_epochs = 2
batch_size = 8

num_x_batches_per_epoch = int(np.floor(num_total_train_records / batch_size / num_epochs))
num_v_batches_per_epoch = int(np.floor(num_total_test_records / batch_size / num_epochs)) 
print(num_x_batches_per_epoch, num_v_batches_per_epoch)
# ---
# will show 3750 625


x_generator = train.batch(batch_size).as_numpy_iterator()
v_generator = test.batch(batch_size).as_numpy_iterator()

model.fit(
    x=x_generator ,
    epochs=num_epochs,
    batch_size=batch_size,
    steps_per_epoch=num_x_batches_per_epoch,
    validation_data=v_generator,
    validation_steps=num_v_batches_per_epoch,
    validation_batch_size=batch_size
)

By minus 1, it works.

num_v_batches_per_epoch = int(np.floor(num_total_test_records / batch_size / num_epochs)) -1  # Cuase ran out of data without -1

Please help understand this behavior. Also the document says Only relevant if validation_data is provided and is a tf.data dataset. but obviously it is not only for tf.data.Dataset.

Environment

tensorflow 2.14.1
Python 3.10.12
Ubuntu 22.04 LTS

2

Answers


  1. First of all you are computing the steps wrong, you don’t divide by the number of epochs at all just the data size to the batch size, because the steps are the number of fetched batches per each epoch:

    num_x_batches_per_epoch = int(np.floor(num_total_train_records / batch_size))
    num_v_batches_per_epoch = int(np.floor(num_total_test_records / batch_size))
    

    Second you are using a Tensorflow dataset objects

    (train, test), info = tfds.load 
    

    so not a custom generator, so you don’t specify the steps for training or validation.

    Check this tutorial:
    https://www.tensorflow.org/datasets/keras_example

    Login or Signup to reply.
  2. Specifically for tfds datasets, you can get the number of total observations with with_info=True like you did.

    import tensorflow_datasets as tfds
    import tensorflow as tf
    
    (train, test), info = tfds.load(
        'mnist',
        split=['train', 'test'],
        shuffle_files=True,
        as_supervised=True,
        with_info=True,
    )
    
    num_total_train_records = info.splits['train'].num_examples
    num_total_test_records = info.splits['test'].num_examples
    

    Then, to create the correct steps_per_epoch, you can use the floor division operator to get the correct number of batches:

    num_x_batches_per_epoch = num_total_train_records // batch_size
    num_v_batches_per_epoch = num_total_test_records // batch_size 
    

    Complete example:

    import tensorflow_datasets as tfds
    import tensorflow as tf
    (train, test), info = tfds.load(
        'mnist',
        split=['train', 'test'],
        shuffle_files=True,
        as_supervised=True,
        with_info=True,
    )
    num_total_train_records = info.splits['train'].num_examples
    num_total_test_records = info.splits['test'].num_examples
    
    model = tf.keras.models.Sequential([
      tf.keras.layers.Flatten(input_shape=(28, 28)),
      tf.keras.layers.Dense(128, activation='relu'),
      tf.keras.layers.Dense(10)
    ])
    model.compile(
        optimizer=tf.keras.optimizers.Adam(0.001),
        loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
        metrics=[tf.keras.metrics.SparseCategoricalAccuracy()],
    )
    
    num_epochs = 2
    batch_size = 8
    
    num_x_batches_per_epoch = num_total_train_records // batch_size
    num_v_batches_per_epoch = num_total_test_records // batch_size 
    x_generator = train.batch(batch_size).as_numpy_iterator()
    v_generator = test.batch(batch_size).as_numpy_iterator()
    model.fit(
        x=x_generator ,
        epochs=num_epochs,
        batch_size=batch_size,
        steps_per_epoch=num_x_batches_per_epoch,
        validation_data=v_generator,
        validation_steps=num_v_batches_per_epoch,
        validation_batch_size=batch_size
    )
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search