The tensorflow versions that I can still recreate this behavior are: 2.7.0
, 2.7.3
, 2.8.0
, 2.9.0
. Actually, these are all the versions I’ve tried; I wasn’t able to resolve the issue in any version.
- OS: Ubuntu 20
- GPU: RTX 2060
- RAM: 16GB
I am trying to feed my data to a model using a generator:
class DataGen(tf.keras.utils.Sequence):
def __init__(self, indices, batch_size):
self.X = X
self.y = y
self.indices = indices
self.batch_size = batch_size
def __getitem__(self, index):
X_batch = self.X[self.indices][
index * self.batch_size : (index + 1) * self.batch_size
]
y_batch = self.y[self.indices][
index * self.batch_size : (index + 1) * self.batch_size
]
return X_batch, y_batch
def __len__(self):
return len(self.y[self.indices]) // self.batch_size
train_gen = DataGen(train_indices, 32)
val_gen = DataGen(val_indices, 32)
test_gen = DataGen(test_indices, 32)
where X
, y
is my dataset loaded from a .h5
file using h5py
, and train_indices
, val_indices
, test_indices
are the indices for each set that will be used on X
and y
.
I am creating the model and feeding the data using:
# setup model
base_model = tf.keras.applications.MobileNetV2(input_shape=(128, 128, 3),
include_top=False)
base_model.trainable = False
mobilenet1 = Sequential([
base_model,
Flatten(),
Dense(27, activation='softmax')
])
mobilenet1.compile(optimizer=tf.keras.optimizers.Adam(),
loss=tf.keras.losses.CategoricalCrossentropy(),
metrics=['accuracy'])
# model training
hist_mobilenet = mobilenet1.fit(train_gen, validation_data=val_gen, epochs=1)
The memory right before training is 8%, but the moment training starts it begins getting values from 30% up to 60%. Since I am using a generator and loading the data in small parts of 32 observations at a time, it seems odd to me that the memory climbs this high. Also, even when training stops, memory stays above 30%. I checked all global variables but none of them has such a large size. If I start another training session memory starts having even higher usage values and eventually jupyter notebook kernel dies.
Is something wrong with my implementation or this is normal?
Edit 1: some additional info.
- Whenever the training stops, memory usage drops a little, but I can decrease it even more by calling garbage collector. However, I cannot bring it back down to 8%, even when I delete the history created by fit
- the x and y batches’ size sum up to 48 bytes; this outrages me! how come loading 48 of data at a time is causing the memory usage to increase that much? Supposedly I am using HDF5 dataset to be able to handle the data without overloading RAM. The next thing that comes to my mind is that
fit
creates some variables, but it doesn’t make sense that it needs so many GBs of memory to store them
3
Answers
How to minimize RAM usage
From the very helpful comments and answers of our fellow friends, I came to this conclusion:
I am using garbage collector just to be safe.
__getitem__
.Correct solution:
Wrong solution:
same for y
Notice the difference? In the wrong solution I am loading the whole dataset (training, validation or testing) in memory! Instead, in the correct solution I am only loading the batch meant to feed in the
fit
method.With this setup, I managed to raise RAM only to 2.88 GB, which is pretty cool!
Literally, this is not a generator. When you instantiate DataGen, you create a complete class with full indices (def init (self, indices, batch_size)), with datasets (self.X, self.Y), with inheritance from Sequential, and so on.
The simplest real generator for tensorflow looks something like this:
Make use of
fit_generator
instead of thefit
methodI mean instead of
hist_mobilenet = mobilenet1.fit(train_gen, validation_data=val_gen, epochs=1)
Use
hist_mobilenet = mobilenet1.fit_generator(train_gen, validation_data=val_gen, epochs=1)
according to this answer it says
I think the fit_generator will load data batch-wise and not take up the whole ram instantly.