Cloudify cluster day 2 operations

Introduction

A Cloudify cluster has up to three component clusters: Database, Message Queue, and Manager.

During operation of a Cloudify cluster, it may become necessary to perform operations on these components such as removing faulty/missing nodes, adding new nodes, or other maintenance operations.

Cloudify manager operations

List

This does not modify the running cluster, so can be safely run without maintenance mode.

To see current cluster members, run the following command:

  cfy cluster managers list

Remove

To remove a manager from the cluster, first ensure that the manager itself is uninstalled or the VM/container it is on has been deleted. If you are in any doubt about the state of the old manager node, it is recommended to ensure that its VM/container has been deleted.

Then, from any machine with a CLI configured to access the cluster, run the following command:

   cfy cluster remove <manager name as it appears in the hostname field of the managers list command>

After removal is complete, you can verify that the cluster is healthy by checking the managers list, and the cluster status:

  cfy cluster managers list
  cfy cluster status

All expected managers should be listed, and the status should be healthy. Note that cluster status can take up to ~30 seconds to stabilise.

Add

To add a new manager node, install the node with the same network, DB and broker settings in the config.yaml as the existing managers.

After install is complete, you can verify that the cluster is healthy by checking the managers list, and the cluster status:

  cfy cluster managers list
  cfy cluster status

All expected managers should be listed, and the status should be healthy. Note that cluster status can take up to ~30 seconds to stabilise.

Message queue operations

List

This does not modify the running cluster, so can be safely run without maintenance mode.

To see current cluster brokers from a CLI connected to the cluster, run:

  cfy cluster brokers list

Alternatively, to get a listing including active alarms, from a broker run:

  cfy_manager brokers list

Remove

To remove a broker from a cluster, first ensure that the broker itself is uninstalled or the VM/container it is on has been deleted. If you are in any doubt about the state of the old broker node, it is recommended to ensure that its VM/container has been deleted.

Next, both of the following steps must be performed:

From one of the broker nodes, remove the node with the following command:

  cfy_manager brokers remove -r <broker name as it appears in the broker_name field of the cfy_manager brokers list>

From a CLI connected to the cluster, run:

  cfy cluster brokers remove <broker name as it appears in the name field of the cfy cluster brokers list>

After removing the broker, you can verify that the cluster is healthy by checking the brokers list, and the cluster status:

  cfy cluster brokers list
  cfy cluster status

All expected brokers should be listed, and the status should be healthy. Note that cluster status can take up to ~30 seconds to stabilise.

Add

To add a new broker node, install the node with the same cluster nodes list, networks, and erlang cookie as the existing brokers in the new broker’s config.yaml.

You can confirm the broker has been added to the rabbit cluster properly by listing brokers on any of the broker nodes:

  cfy_manager brokers list

After install is complete, you will need to add the broker to the manager cluster’s known brokers by running:

  cfy cluster brokers add <hostname of new broker> <IP or resolvable DNS name of new broker>

After adding the broker, you can verify that the cluster is healthy by checking the brokers list, and the cluster status:

  cfy cluster brokers list
  cfy cluster status

All expected brokers should be listed, and the status should be healthy. Note that cluster status can take up to ~30 seconds to stabilise.

Database operations

List

This does not modify the running cluster, so can be safely run without maintenance mode.

DB nodes can be listed from DB or manager nodes using the command:

  cfy_manager dbs list

Remove

To remove a DB from a cluster, first ensure that the DB itself is uninstalled or the VM/container it is on has been deleted. If you are in any doubt about the state of the old DB node, it is recommended to ensure that its VM/container has been deleted.

Next, the DB must be removed from the DB cluster using the following command on any current DB node:

  cfy_manager dbs remove -a <DB address as it appears in the node_ip field of the cfy_manager dbs list>

Then, on every manager the following command must be run:

  cfy_manager dbs remove -a <DB address as it appears in the node_ip field of the cfy_manager dbs list>

Following this, the manager state should be updated by running the following command once with any CLI connected to the cluster:

  cfy db-nodes update

After removing the DB node, you can verify that the DB cluster settings are correct by checking the DB nodes list on one DB node and all manager nodes with:

  cfy_manager dbs list

All expected DB nodes should be listed on the DB node and all managers. Then, you can confirm the cluster is healthy with:

  cfy cluster status

The status should be healthy. Note that cluster status can take up to ~30 seconds to stabilise.

Add

To add a new DB node, install the node with the same DB cluster settings, including cluster nodes, with the new node in the node list in its config.yaml.

After install is complete, you will need to add the DB to each manager in the cluster. On every manager, run:

  cfy_manager dbs add -a <IP or resolvable DNS name of new broker>

Following this, the manager state should be updated by running the following command once with any CLI connected to the cluster:

  cfy db-nodes update

After adding the DB node, you can verify that the DB cluster settings are correct by checking the DB nodes list on one DB node and all manager nodes with:

  cfy_manager dbs list

All expected DB nodes should be listed on the DB node and all managers. Then, you can confirm the cluster is healthy with:

  cfy cluster status

The status should be healthy. Note that cluster status can take up to ~30 seconds to stabilise.

Set master

If you wish to change the current DB master node, e.g. because the current master node is going to be undergoing maintenance operations, run the following command on a DB node:

  cfy_manager dbs set-master -a <intended new master's DB address as it appears in the node_ip field of the cfy_manager dbs list>

Re-initialise

If one of the DB replicas is failing to replicate, with an ever-growing lag it can be fixed by running the following command on a DB node:

  cfy_manager dbs reinit -a <DB node to re-initialise's address as it appears in the node_ip field of the cfy_manager dbs list>