Taking Leaves Offline without Cluster Downtime

Info

If you are managing your cluster with MemSQL Ops, go here.

Occasionally hosts need to be taken offline for maintenance (upgrading memory, etc). This can present a challenge if these hosts are home to one or more MemSQL leaf nodes.

By following the steps below, you can detach MemSQL leaf nodes from a MemSQL cluster, take the host offline for maintenance, and attach the leaves back to the cluster following maintenance. This can all be done without downtime to the MemSQL cluster.

Assumptions:

  • The steps below assume the host IP addresses will not change during maintenance.

  • The steps below assume the cluster is configured for High Availability (redundancy 2). If both leaves in a paired group of leaves are detached from the cluster, the cluster will become unavailable and downtime will be experienced. For this reason only one availability group of leaves should be detached at a time.

Step 1: Check for long running queries

Before removing leaves make sure there are no long running queries present in the cluster. You can check this by using the SQL Editor in MemSQL Studio and running the following:

SELECT * FROM information_schema.PROCESSLIST WHERE COMMAND = 'QUERY' AND STATE = 'executing';

Step 2: Ensure all database partitions are balanced

Read the Understanding Orphaned Partitions topic to verify if any orphaned partitions exist in the cluster. If there are, this topic explains how to resolve them.

Step 3. Confirm the leaf node you want to take offline has an online paired leaf on a different host

To confirm this, run memsql-admin show-leaves and check the results. Suppose you have a leaf node running on 172.18.1.5 and you want to take it offline. To confirm it has an online paired leaf, run memsql-admin show-leaves and observe that this node’s paired host is 172.18.1.6 and that the paired host is online:

memsql-admin show-leaves
****
✓ Successfully ran 'memsqlctl show-leaves'
+------------+------+--------------------+------------+-----------+--------+--------------------+--------------------------------+
|    Host    | Port | Availability Group | Pair Host  | Pair Port | State  | Opened Connections | Average Roundtrip Latency (ms) |
+------------+------+--------------------+------------+-----------+--------+--------------------+--------------------------------+
| 172.18.1.5 | 3306 | 1                  | 172.18.1.6 | 3306      | online | 1                  | 1.538                          |
| 172.18.1.5 | 3307 | 1                  | 172.18.1.6 | 3307      | online | 2                  | 0.765                          |
| 172.18.1.6 | 3306 | 2                  | 172.18.1.5 | 3306      | online | 2                  | 0.898                          |
| 172.18.1.6 | 3307 | 2                  | 172.18.1.5 | 3307      | online | 2                  | 1.491                          |
+------------+------+--------------------+------------+-----------+--------+--------------------+--------------------------------+

Step 4: Detach the MemSQL leaf or aggregator node(s) from the host to be taken offline for maintenance

A MemSQL leaf node is detached from a MemSQL cluster by using the following syntax:

DETACH LEAF'host':port;

For more information on this command see the reference.

Note: If both leaves in a paired group of leaves are detached from the cluster will become unavailable and downtime will be experienced. For this reason only one availability group of leaves should be detached at a time.

For host machines running aggregator nodes, use the following syntax to detach an aggregator from a host:

REMOVE AGGREGATOR 'host':port;

For more information on this command see the reference.

Step 5: Stop the MemSQL node(s)

Stop the MemSQL node(s) (leaves and aggregators) residing on all hosts that will be taken offline for maintenance.

memsql-admin stop-node --memsql-id <MemSQL_ID>

For more information on this command see the reference.

Step 6: Take the host offline, perform maintenance, bring host back online and confirm MemSQL is running

It is now safe to power down the host and perform maintenance. After performing maintenance bring the host back online.

Step 7: Start the MemSQL node(s)

Start the MemSQL node(s) (leaves and aggregators) residing on all hosts that were previously taken offline for maintenance and are now back online.

memsql-admin start-node --memsql-id <MemSQL_ID>

Step 8: Attach the MemSQL leaf or aggregator node(s) back to the host that was taken offline for maintenance

Once maintenance is completed, the host is back online and MemSQL is running attach the MemSQL leaf or aggregator node(s) back to the cluster.

A MemSQL leaf node is attached to a MemSQL cluster by using the following command from the master aggregator node:

ATTACH LEAF 'host':port NO REBALANCE;

For more information on this command, see the reference.

Note: If you took multiple leaf nodes offline and are attaching them back to the cluster you can use the reference to attach all detached leaves with one command:

To attach an aggregator node back to a cluster, use the following syntax:

ADD AGGREGATOR user:'password'@'host':port;

For more information on this command see the reference.

Step 9: Rebalance cluster partitions

After attaching the MemSQL leaf node(s) to the host run the following command on your MemSQL master aggregator node for each of your databases:

REBALANCE PARTITIONS ON <db_name_here>;

For more information on this command, see the reference.

Running REBALANCE PARTITIONS will redistribute data across your cluster. In doing so a portion of data in your cluster will be relocated to the MemSQL nodes that were attached in step 8.

Was this article useful?