Taking Leaves Offline without Cluster Downtime

Occasionally hosts need to be taken offline for maintenance (upgrading memory, etc). This can present a challenge if these hosts are home to one or more MemSQL leaf nodes.

By following the steps below, you can detach MemSQL leaf nodes from a MemSQL cluster, take the host offline for maintenance, and attach the leaves back to the cluster following maintenance. This can all be done without downtime to the MemSQL cluster.

Assumptions:

  • The steps below assume the host IP addresses will not change during maintenance. If the IP addresses will change during maintenance, then follow this guide.

  • The steps below assume the cluster is configured for High Availability (redundancy 2). If both leaves in a paired group of leaves are detached from the cluster the cluster will become unavailable and downtime will be experienced. For this reason only one availability group of leaves should be detached at a time.

Step 1: Enable manual cluster control of the MemSQL cluster

Manual cluster control is enabled by running the following command:

memsql-ops cluster-manual-control --enable

For more information on this command, see the reference.

The cluster is now under manual control. We will not automatically restart stopped nodes, update license files, etc.

Enabling manual cluster control will prevent the leaf nodes we remove from automatically being re-added to the cluster.

You can check the current status (enabled or disabled) of manual cluster control in the MemSQL Ops Dashboard:

image

Step 2: Check long running queries section of Ops web UI

Before removing leaves make sure there are no long running queries present in the cluster:

image

Step 3: Ensure all database partitions are balanced

Read the Understanding Orphaned Partitions topic to verify if any orphaned partitions exist in the cluster. If there are, this topic explains how to resolve them.

Step 4. Confirm the leaf node you want to take offline has an online paired leaf on a different host

To confirm this, run SHOW LEAVES and check the results. Suppose you have a leaf node running on 192.168.1.110 and you want to take it offline. To confirm it has an online paired leaf, run SHOW LEAVES and observe that this node’s paired host is 192.168.1.112 and the paired host is online:

+----------------+------+--------------------+---------------+------------+--------+--------------------+------------------------------+--------+
| Host           | Port | Availability_Group | Pair_Host     |  Pair_Port | State  | Opened_Connections | Average_Roundtrip_Latency_ms | NodeId |
+------------+------+--------------------+------------+-----------+--------+--------------------+------------------------------+-------+--------+
| 192.168.1.110  | 3307 |                  1 | 192.168.1.112 |  3307      | online |                  2 |                        0.168 |      6 |
| 192.168.1.112  | 3307 |                  2 | 192.168.1.110 |  3307      | online |                  2 |                        0.255 |      8 |
+------------+------+--------------------+------------+-----------+--------+--------------------+------------------------------+-------+--------+
2 rows in set (0.00 sec)

Step 5: Detach the MemSQL leaf node(s) from the host to be taken offline for maintenance

A MemSQL leaf node is deatched from a MemSQL cluster by using the following syntax:

DETACH LEAF'host':port;

For more information on this command see the reference.

Note: If both leaves in a paired group of leaves are detached from the cluster will become unavailable and downtime will be experienced. For this reason only one availability group of leaves should be detached at a time.

Step 6: Detach the MemSQL aggregator(s) from the host to be taken offline for maintenance

Use the following syntax to detach an aggregator from a host:

REMOVE AGGREGATOR 'host':port;

For more information on this command see the reference.

Step 7: Stop the MemSQL node(s)

Stop the MemSQL node(s) (leaves and aggregators) residing on all hosts that will be taken offline for maintenance. Stop the MemSQL nodes using memsql-ops memsql-stop.

For more information on this command see the reference.

Step 8: Stop the Ops follower agent

Stop the Ops follower agent residing on all hosts that will be taken offline for maintenance. You can shut down a follower agent using memsql-ops agent-stop OR by SSHing directly to the host and running memsql-ops stop.

For more information on this command see the reference.

Step 9: Take the host offline, perform maintenance, bring host back online and confirm MemSQL is running

It is now safe to power down the host and perform maintenance. After performing maintenance bring the host back online.

Step 10: Start the Ops follower agent

Start the Ops follower agent residing on all hosts that were previously taken offline for maintenance and are now back online. You can start a follower agent using memsql-ops agent-start OR by SSHing directly to the host and running memsql-ops start.

Step 11: Start the MemSQL node(s)

Start the MemSQL node(s) (leaves and aggregators) residing on all hosts that were previously taken offline for maintenance and are now back online. Start the MemSQL node(s) using memsql-ops memsql-start.

Step 12: Attach the MemSQL leaf node(s) back to the host that was taken offline for maintenance

Once maintenance is completed, the host is back online and MemSQL is running attach the MemSQL leaf node(s) back to the cluster.

A MemSQL leaf node is attached to a MemSQL cluster by using the following command from the master aggregator node:

ATTACH LEAF 'host':port NO REBALANCE;

For more information on this command, see the reference.

Note: If you took multiple leaf nodes offline and are attaching them back to the cluster you can use the reference to attach all detached leaves with one command:

Step 13: Attach the MemSQL aggregator(s) back to the host that was taken offline for maintenance

Use the following syntax to detach an aggregator from a host:

ADD AGGREGATOR user:'password'@'host':port;

For more information on this command see the reference.

Step 14: Disable manual control of the MemSQL cluster

Manual cluster control is disabled by running the following command:

memsql-ops cluster-manual-control --disable

For more information on this command, see the reference. The cluster is no longer under manual control. We will now automatically restart stopped nodes, update license files, etc.

Step 15: Run REBALANCE PARTITIONS

After attaching the MemSQL leaf node(s) to the host run the following command on your MemSQL master aggregator node for each of your databases:

REBALANCE PARTITIONS ON <db_name_here>;

For more information on this command, see the reference.

Running rebalance partitions will redistribute data across your cluster. In doing so a portion of data in your cluster will be relocated to the MemSQL nodes that were attached in step 12.

Was this article useful?