I have a requirement to update many documents (300 thousand+) in Realtime Database. I have written a js code to do this using Firebase Admin SDK. This code works fine when documents are less(~3000). But fails when a large number of documents are there (tested with 14k) with js heap out of memory.
My js code:-
const firebaseAuth = require('./firebase/firebase.admin')
async function main() {
await firebaseAuth.firebaseAuthInit();
let db = firebaseAuth.admin.database();
let usersRef = db.ref("users");
return usersRef.once("value").then(async function (snapshot) {
let updates = [];
let usersCount = 0;
snapshot.forEach(async function (userSnapshot) {
let user = userSnapshot.val();
let userKey = userSnapshot.key;
if (user.hasOwnProperty('Settings')) {
//let promise = await usersRef.child(userKey).get();
let promise = usersRef.child(userKey).update({
'Settings': null
}).then(
console.log(`Removed Settings attribute for user: ${userKey}`)
)
updates.push(promise);
}
usersCount++;
})
let settingRemoved = (await Promise.all(updates)).length;
console.log(`Numbers of Users: ${usersCount}. Total no of User Settings Deleted: ${settingRemoved}`)
}, function (error) {
console.error(error);
}).then(function () {
process.exit()
})
}
main()
The error I got is:
<--- Last few GCs --->
[683:0x5cea240] 76471 ms: Scavenge 2037.5 (2062.5) -> 2035.2 (2063.3) MB, 11.3 / 0.0 ms (average mu = 0.185, current mu = 0.148) allocation failure
[683:0x5cea240] 76484 ms: Scavenge 2038.4 (2063.3) -> 2036.1 (2064.0) MB, 6.1 / 0.0 ms (average mu = 0.185, current mu = 0.148) allocation failure
[683:0x5cea240] 76507 ms: Scavenge 2039.0 (2064.0) -> 2036.9 (2072.8) MB, 7.8 / 0.0 ms (average mu = 0.185, current mu = 0.148) allocation failure
<--- JS stacktrace --->
FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
1: 0xb200e0 node::Abort() [node]
2: 0xa3c157 node::FatalError(char const*, char const*) [node]
3: 0xd083ae v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
4: 0xd08727 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
5: 0xee9785 [node]
6: 0xeea2cc [node]
7: 0xef8269 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
8: 0xefb535 v8::internal::Heap::HandleGCRequest() [node]
9: 0xe8fc17 v8::internal::StackGuard::HandleInterrupts() [node]
10: 0x12364e2 v8::internal::Runtime_StackGuardWithGap(int, unsigned long*, v8::internal::Isolate*) [node]
11: 0x1640839 [node]
Aborted (core dumped)
The issue might be caused due to pushing all promises to a single array and this is causing heap memory overflow.
Is there any way I can avoid this? I couldn’t find any way to do these in batches or sequentially(however sequentially might not be viable if runtime goes beyond 5 mins for 300k records).
Node version: 16.4
2
Answers
That’s the expected behavior since 14k represents a lot of data, that doesn’t actually fit into the memory. There are two ways in which you can solve this situation.
The first one would be to update the data only on demand. Let’s say you have a particular page where you want to display some data, in which a field should not exist anymore, and another one should have something updated. Now, you should perform a check against those fields, only when the user opens the page. So if the first field was not deleted and the other one was not updated, then perform the deletion and the update operation and right after that display the data, otherwise simply display the data. In this way, you’ll only update the data when it is actually needed. So most likely you will never end up updating all those 300 thousand+ nodes since not all users will access that screen.
The second solution would be to create a Cloud Function and start to update all the data by performing multiple simultaneous updates. But before starting, try to check the Realtime Database Limits.
In addition to the two approaches Alex suggested, a third option would be to update the
Users
in chunks. So say that you decide to process 100 users at a time:ref.orderByKey().limitToFirst(100)
.ref.orderByKey().startAfter(theLastKey).limitToFirst(100)
.Alternatively, since you are removing a node, you can also use that to filter which child nodes you still need to update. With that the process becomes:
ref.orderByChild("Settings").limitToFirst(100)
.In addition to being a bit simpler, this also has the advantage that you’re not reading nodes that already don’t have a
Settings
property.