Cloudify Backup and Restore Guide
Overview
Snapshots provide a way for the state of Cloudify HA cluster. A cloudify snapshot should be done on a daily basis (suggest in an off peak time) and can be automated using the REST API as an alternative to an operator manually running the snapshot as shown here in this user guide.
Backing up the virtual machine that the cloudify managers run on should be done at regular intervals, this would be dictated by a backup policies and would likely involve daily, weekly, monthly and yearly backups as required. The method for backing up the Cloudify Manager virtual machines falls outside the scope of this document.
Snapshots
Snapshots of the Cloudify HA cluster should be taken at regular intervals (suggest daily), this can be automated through the REST based Service API or can be done manually by an operator using the UI or CFY CLI. The screenshot below shows the menu presented to the operator when the settings button (i.e. cog icon on the right top of the menu) is clicked.
Creating snapshot
-
Create snapshot:
Code Block 1 CLI
cfy snapshots create —include-metrics —include-credentials SNAPSHOT_ID
Code Block 2 REST
curl -X PUT --header "Tenant: <manager-tenant>" -u <manager-username>:<manager-password> "http://<manager-ip>/api/v3.1/snapshots/<snapshot-id>"
Parameters specification available in the Cloudify API documentation.
-
Download snapshot:
Code Block 3 CLI
cfy snapshots download [OPTIONS] SNAPSHOT_ID
Code Block 4 REST
curl -X GET --header "Tenant: <manager-tenant>" -u <manager-username>:<manager-password> "http://<manager-ip>/api/v3.1/snapshots/<snapshot-id>/archive" > <snapshot-archive-filename>.zip
Parameters specification available in the Cloudify API documentation.
Applying snapshot
-
Upload snapshot
Code Block 5 CLI
cfy snapshots upload [OPTIONS] SNAPSHOT_PATH
Code Block 6 REST
curl -X PUT --header "Tenant: <manager-tenant>" -u <manager-username>:<manager-password> "http://<manager-ip>/api/v3.1/snapshots/archive?snapshot_archive_url=http://url/to/archive.zip"
Parameters specification available in the Cloudify API documentation.
-
Restore snapshot
Code Block 7 CLI
cfy snapshots restore [OPTIONS] —tenant-name <TEXT> SNAPSHOT_ID
Code Block 8 REST
curl -s -X POST
--header "Content-Type: application/json"
--header "Tenant: <manager-tenant>"
-u <manager-username>:<manager-password>
-d '{"tenant_name": "<manager-tenant>", "recreate_deployments_envs": true, "force": false, "restore_certificates": false, "no_reboot": false}'
"http://<manager-ip>/api/v3.1/snapshots/<snapshot-id>/restore"
Parameters specification available in the Cloudify API documentation.
Failure Recovery
Whole cluster down or working wrong
- Save /etc/cloudify/ssl/* files.
- Teardown managers.
- Install fresh managers with existing certificates in /etc/cloudify/config.yaml.
- Create and join cluster.
- Apply latest working version snapshot on active manager.
One manager cluster node down
- Remove manager from the cluster.
- Destroy manager.
- Bootstrap fresh manager.
- Join existing cluster.
Effect: Healthy manager cluster
Active manager node down
- Other healthy manager from the cluster automatically becomes active manager.
- Investigate error:
- Either:
- Fix problem
- Destroy manager.
- Install manager.
- Join cluster.
Effect: Healthy manager cluster
Split brain
Description: Situation happens when for a while there is no connectivity between managers. Then each of them thinks that other managers are unhealthy and become master. After connectivity is back master becomes only one in cluster. It’s chosen based on the newest version of PostgreSQL database. All data from other managers will be synced with the active one and others will become standbys. All data/installed deployments/plugins will get lost.