I am wondering if there is a good way to back up Matillion data (projects and their underlying jobs) outside of instance level snapshots? Is there a datastore of some sort that I can backup using on-instance tooling?
3 Community Answers
Ian Funnell —
You can certainly backup the datastore if you like – it’s a mongodb instance running on the EC2, so you can use mongodump for example. But this is basically a similar approach to taking an instance-level snapshot, and I would strongly recommend sticking with instance-level snapshots as your primary physical backup mechanism. You may have already found Matillion’s option to automate that, under Project / Manage Backups.
To take a logical backup of just the Matillion metadata, you should use the Project / Export (and Project / Import) options. These are for backing up Projects and Jobs, exactly as you mentioned. The transfer format is JSON so the metadata can easily be added to (external) source code control.
Matillion also has a REST API which allows you to save and restore an entire Project version in one call. Please see under “Endpoint 1” in this document
To clarify, we are not running on EC2, but GCP. I do not have a "Project / Manage Backups option" available in the UI. I also do not see mongodb running on the instance, but PostgreSQL instead. I know that PostgreSQL deals reasonably well with filesystem snapshots (on restore it will think the previous server instance crashed and will replay the WAL log).
I am comfortable automating the volume snapshot outside of Matillion but am curious if you will be adding the Manage Backups feature for GCP?
You are right, there are some minor differences between the different versions of Matillion, and the GCP version doesn’t yet have the same automated backup feature that the Redshift version does. It will be added (intrernal feature request EMD-4049) but at the moment I’m not sure when it will appear.