Installing MemSQL on bare metal, on virtual machines, or in the cloud can be done through the use of popular configuration management tools, such as CloudFormation, or through MemSQL’s management tools.
In this guide, you will deploy a MemSQL cluster onto physical or virtual machines and connect to the cluster using our monitoring, profiling, and debugging tool, MemSQL Studio.
A four-node cluster is the minimal recommended cluster size for showcasing MemSQL as a distributed database with high availability; however, you can use the procedures in this tutorial to scale out to additional nodes for increased performance over large data sets or to handle higher concurrency loads. To learn more about MemSQL’s design principles and topology concepts, see Distributed Architecture.
In this tutorial, you will learn how to deploy a MemSQL cluster:
sudouser that can install MemSQL via Red Hat or Debian packages
As a non-
sudouser that can install MemSQL via tarball
From the main deployment host that can connect to the Internet (online deployment) or that cannot connect to the Internet (offline deployment)
Using individual MemSQL tools on the command line
Using a cluster configuration file (a.k.a. “cluster file”)
There are no licensing costs for using up to four license units for the leaf nodes in your cluster. If you need a larger cluster with more/larger leaf nodes, please create an Enterprise License trial key.
For this tutorial you will need:
Physical or virtual machines (or “hosts”) with the following:
Each MemSQL node requires at least four (4) x86_64 CPU cores and eight (8) GB of RAM per host
Eight (8) vCPU and 32 GB of RAM are recommended for leaf nodes to align with license unit calculations
Running 64-bit version of RHEL/CentOS 6 or higher or Debian 8 or higher, with kernel 3.10 or higher
Port 3306 open on all host for intra-cluster communication. This default can be changed in the cluster file.
Port 8080 open on the main deployment host for the cluster
A non-root user with sudo privileges available on all hosts in the cluster that be used to run MemSQL services and own the corresponding runtime state
SSH access to all hosts (installing and using
ssh-agentis recommended for SSH keys with passwords).
- If using SSH keys, make sure the identity key used on the main deployment host can be used to log into to the other hosts.
A connection to the Internet to download required packages
If running this in a production environment, it is highly recommended that you follow our host configuration recommendations for optimal cluster performance.
As of MemSQL Toolbox 1.4.4, a check for duplicate hosts is performed before MemSQL is deployed, and will display a message similar to the following if more than one host has the same SSH host key:
✘ Host check failed.host 172.26.212.166 has the same ssh host keys as 172.16.212.165, toolbox doesn't support registering the same host twice
Confirm that all specified hosts are indeed different and aren’t using identical SSH host keys. Identical host keys can be present if you have instantiated your host instances from images (AMIs, snapshots, etc.) that contain existing host keys. When a host is cloned, the host key (typically stored in
/etc/ssh/ssh_host_<cipher>_key) will also be cloned.
As each cloned host will have the same host key, an SSH client cannot verify that it is connecting to the intended host. The script that deploys MemSQL will interpret a duplicate host key as an attempt to deploy to the same host twice, and the deployment will fail.
The CentOS 7.x steps below demonstrate a potential remedy for the “duplicate hosts” message.
$ sudo root # ls -al /etc/ssh/ # rm /etc/ssh/<your-ssh-host-keys> # ssh-keygen -f /etc/ssh/<ssh-host-key-filename> -N '' -t rsa1 # ssh-keygen -f /etc/ssh/<ssh-host-rsa-key-filename> -N '' -t rsa # ssh-keygen -f /etc/ssh/<ssh-host-dsa-key-filename> -N '' -t dsa
For more information about SSH host keys, including the equivalent steps for Ubuntu-based systems, refer to Avoid Duplicating SSH Host Keys.
As of MemSQL Toolbox 1.5.3,
memsql-deploy setup-cluster supports an
--allow-duplicate-host-fingerprints option that can be used to ignore duplicate SSH host keys.
Depending on the host and its function in deployment, some or all of the following port settings should be enabled on hosts in your cluster.
These routing and firewall settings must be configured to:
Allow database clients (e.g. your application) to connect to the MemSQL aggregators
Allow all nodes in the cluster to talk to each other over the MemSQL protocol (3306)
Allow you to connect to management and monitoring tools
|TCP||22||Inbound and Outbound||For host access. Required between nodes in MemSQL tool deployment scenarios. Also useful for remote administration and troubleshooting on the main deployment host.|
|TCP||443||Outbound||To get public repo key for package verification. Required for nodes downloading MemSQL APT or YUM packages.|
|TCP||3306||Inbound and Outbound||Default port used by MemSQL. Required on all nodes for intra-cluster communication. Also required on aggregators for client connections.|
|TCP||8080||Inbound and Outbound||Default port for MemSQL Studio. (Only required for the host running Studio.)|
The service port values are configurable if the default values cannot be used in your deployment environment. For more information on how to change them, see:
The cluster file template provided in this guide
The memsql-toolbox-config register-host command
We also highly recommend configuring your firewall to prevent other hosts on the Internet from connecting to MemSQL.