skip to Main Content

We have a PostgreSQL cluster managed by DigitalOcean that we want to back up to Google Cloud Storage on a schedule. I’m planning on creating a Cloud Function that runs on a schedule.

Because our databases aren’t stored on Google Cloud, I can’t use this guide that Google has published. And since pg_dump has to be installed on the system to be available, that isn’t an option in Cloud Functions either.

What options do I have? Is it possible to use something like psycopg2 in Python or pg in Node.js?

2

Answers


  1. tl;dr – dump at DO side, and upload dump to google, or use dedicated backup tool.

    I think issue here is with trying to do that at Google side. Looking at Digital Ocean’s docs – pg_dump / pg_restore can be used. So getting the dump should be easy – upload as well. Side note – I would not use lambda / functions for that. It can take time and resources, so can be canceled, while I would like to have backup 🙂

    So, please, consider other way. Here is tutorial for setting barman – that is open source tool for managing backups, recovery. Difference is that it can be not only once per day, but also it can store wals, so point in time restore can be done. Sure, after a week or something you can stick to dumps and even after a year keep only one per week.

    Besides above, there are tools like https://debezium.io/ – here it looks like overkill, but maybe dump is only part of the story, so to me worth to mention.

    Login or Signup to reply.
  2. since pg_dump has to be installed on the system to be available, that isn’t an option in Cloud Functions either. What options do I have?

    1. Cloud Run. While Cloud Functions are supposed to abstract the underlying platform away, Cloud Run lets you define both the function and the platform it’ll wake up on. Both services make up what Google advertises as Cloud Serverless.
    2. Cloud Compute, a tiny VM with a simple script on it, scheduled to wake up, run it, then go back to sleep. You can save that as a custom image and configure multiple small backup workers like that.

    The latter would kind of emulate what Cloud Run is supposed to handle for you, but it might be more intuitive (that’s pretty much how you’d do it at home or on-prem). Might also be a bit more cost-effective, since more "cloud-native" solutions like Cloud Run, typically come at a premium – similar to how a managed database usually costs a bit more than a VM rented from the same provider, hosting the exact same db, with the exact same resources (except provider’s secret sauce they justify the premium with).

    Still, a job that involves just waking up and piping pg_dump|gsutil cp for a few minutes should cost you little to nothing regardless of which one you pick. You might want to look into multithreaded pg_dump -j to speed things up at the cost of some more CPU, and gsutil -m for multithreaded upload.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search