We have a PostgreSQL cluster managed by DigitalOcean that we want to back up to Google Cloud Storage on a schedule. I’m planning on creating a Cloud Function that runs on a schedule.
Because our databases aren’t stored on Google Cloud, I can’t use this guide that Google has published. And since pg_dump
has to be installed on the system to be available, that isn’t an option in Cloud Functions either.
What options do I have? Is it possible to use something like psycopg2
in Python or pg
in Node.js?
2
Answers
tl;dr – dump at DO side, and upload dump to google, or use dedicated backup tool.
I think issue here is with trying to do that at Google side. Looking at Digital Ocean’s docs – pg_dump / pg_restore can be used. So getting the dump should be easy – upload as well. Side note – I would not use lambda / functions for that. It can take time and resources, so can be canceled, while I would like to have backup 🙂
So, please, consider other way. Here is tutorial for setting barman – that is open source tool for managing backups, recovery. Difference is that it can be not only once per day, but also it can store wals, so point in time restore can be done. Sure, after a week or something you can stick to dumps and even after a year keep only one per week.
Besides above, there are tools like https://debezium.io/ – here it looks like overkill, but maybe dump is only part of the story, so to me worth to mention.
The latter would kind of emulate what Cloud Run is supposed to handle for you, but it might be more intuitive (that’s pretty much how you’d do it at home or on-prem). Might also be a bit more cost-effective, since more "cloud-native" solutions like Cloud Run, typically come at a premium – similar to how a managed database usually costs a bit more than a VM rented from the same provider, hosting the exact same db, with the exact same resources (except provider’s secret sauce they justify the premium with).
Still, a job that involves just waking up and piping
pg_dump|gsutil cp
for a few minutes should cost you little to nothing regardless of which one you pick. You might want to look into multithreadedpg_dump -j
to speed things up at the cost of some more CPU, andgsutil -m
for multithreaded upload.