Quick Start with Amazon Webservices

In Amazon Web Services (AWS), you can create a MemSQL single-instance cluster (i.e. master aggregator and leaf node in a single machine) quickly through the AWS Marketplace, use a CloudFormation template to deploy a multi-node cluster, or you can set up a full MemSQL cluster manually by provisioning EC2 instances and deploying MemSQL on those machines.

Using the AWS Marketplace

Launching a single-instance cluster on Amazon AWS Marketplace is simple and fast.

Click on the link above to go immediately to “1-Click” launching with the default settings. By default MemSQL will launch on an m3.2xlarge instance (4 CPU cores and 15GB of RAM), with access to ports 22 (ssh) and 80 (MemSQL Ops UI) open, and the first SSH key on your list. It will launch into your default AWS region, for example “US East (N. Virgina)”.

Info

MemSQL has no licensing costs when the total RAM size of the cluster is at 128 GB of RAM or less. Larger cluster sizes or paid support requires an Enterprise license.

Once the instance is provisioned, you can access the MemSQL Ops UI in your browser [TODO: Verify that it’s Ops or Studio], using the IP address or public DNS name of your instance. The Ops UI user is “memsql” and the password is the AWS instance ID. You can find the instance ID in your EC2 console.

http://<your-instance-ip>

user: memsql
password: <instance-id>
Info

The default port for MemSQL Ops is 9000. However, in the AWS marketplace image of MemSQL, it is configured to listen on port 80, the default HTTP port.

To connect to the MemSQL database with a MySQL-compatible client from a different machine, make sure to open port 3306 in your Security Group. The default MemSQL user is “root”, and the password is the AWS instance ID. You can find the instance ID in your EC2 console.

Optional Settings

Below are instructions for changing the default settings on your instance via the Marketplace.

Choose Region and Instance Type

AWS has several regions throughout the world. The instance types available vary by region. We recommend the m3 or m4.2xlarge instance for good price/performance. For larger RAM capacity, the R series has a higher ratio of RAM to CPUs. For columnstore workloads, the I and D series come with ample disk storage.

Info

Some newer instance types such as the m4 series are not available under “EC2 Classic” networking, and require a virtual private cloud (VPC) and subnet to launch into.

Choose Security Group

By default MemSQL launches with port 22 (ssh) and 80 (MemSQL Ops) open. The MySQL protocol port 3306 is not publicly accessible. You can choose one of your own security groups, or manually edit the default settings.

Choose a Keypair

By default, your marketplace image will launch into your first keypair in the region, sorted alphabetically. You can choose a new keypair to use from the dropdown list. Make sure you have the private key, otherwise you will not be able to access your image via SSH.

Launch Instance

Click the “Launch with 1-Click” button. Once the instance is provisioned, you can access the MemSQL Ops UI in your browser, using the IP address or public DNS name of your instance. The Ops UI user is “memsql” and the password is the AWS instance ID. You can find the instance ID in your EC2 console.

http://<your-instance-ip>

user: memsql
password: <instance-id>
Info

The default port for MemSQL Ops is 9000. However, in the AWS marketplace image of MemSQL, it is configured to listen on port 80, the default HTTP port.

To connect to the MemSQL database with a MySQL-compatible client from a different machine, make sure to open port 3306 in your Security Group. The default MemSQL user is “root” and the password is the AWS instance ID.

Setting up on AWS Manually

If you want to deploy a full cluster in AWS, you can spin up EC2 nodes the normal way and follow the standard Quick Start On-Premises guide. However, there are some “gotchas” with AWS that are outlined below.

Setting Correct Security Groups

MemSQL nodes communicate with each other on ports 3306 and 9000. Therefore your security group should be configured, at a minimum, to allow connections to each other on those ports. The easy way to do this is just open up those ports to the world. However, you may not want the world to be able to make a connection directly to MemSQL or to access the MemSQL-Ops dashboard, even though they are easily password protected. In this case specify limit access on these ports using the custom ip option.

If you’d like, you can take advantage of security groups to have a dynamic white-list. When you specify a custom ip in a security group you can, instead of inputting a network mask, put in the name of a security group. This will automatically white-list all EC2 nodes that are assigned to that security group. Start typing the name of your security group in the ip field (sg-) and you will see a pop-up menu that will allow you to select an existing security group. image

Additionally, the deploy process uses ssh to copy MemSQL-Ops binaries to all your nodes, so if you want to take advantage of that, these same considerations apply to port 22, or your SSH port of choice.

Clients connect to your aggregators on port 3306. Make sure you have a rule to allow your clients to do this. Additionally, you as the system owner will want to connect to your nodes on port 9000 to access the dashboard.

Mount Disks

Even if you configured an EC2 node to have an EBS Volume, it will not be mounted when you first start your node. If you configured it to have ephemeral storage, it will probably be mounted in /mnt but you should confirm this. You will need to ensure all nodes have a large disk mounted and make sure MemSQL-Ops is configured to deploy MemSQL to the large disk. MemSQL uses the disk for durability of in-memory tables, caching query plans, logs, and storing data in our on-disk columnar storage format. See Columnstore.

Use the df -h command to check if your disk is mounted on each node. This EC2 Node has a large(ish) ephemeral storage volume mounted to /mnt:

	ubuntu@ip-192-168-128-247:~$ df -h
	Filesystem      Size  Used Avail Use% Mounted on
	/dev/xvda1      7.8G  780M  6.6G  11% /
	none            4.0K     0  4.0K   0% /sys/fs/cgroup
	udev            7.4G   12K  7.4G   1% /dev
	tmpfs           1.5G  332K  1.5G   1% /run
	none            5.0M     0  5.0M   0% /run/lock
	none            7.4G     0  7.4G   0% /run/shm
	none            100M     0  100M   0% /run/user
	/dev/xvdb        37G   49M   35G   1% /mnt

This EC2 Node does not have a large disk mounted:

	ubuntu@ip-10-0-0-96:~$ df -h
	Filesystem      Size  Used Avail Use% Mounted on
	/dev/xvda1      7.8G  780M  6.6G  11% /
	none            4.0K     0  4.0K   0% /sys/fs/cgroup
	udev            7.9G   12K  7.9G   1% /dev
	tmpfs           1.6G  332K  1.6G   1% /run
	none            5.0M     0  5.0M   0% /run/lock
	none            7.9G     0  7.9G   0% /run/shm
	none            100M     0  100M   0% /run/user

If there is no large disk mounted, use the lsblk command to view your available disk devices:

	ubuntu@ip-10-0-0-96:~$ lsblk
	NAME    MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
	xvda    202:0    0     8G  0 disk
	└─xvda1 202:1    0     8G  0 part /
	xvdb    202:16   0   256G  0 disk

We have a 256GB Volume, xvdb, that is not mounted. To mount the disk, first you need to see if it needs a file system using file -s:

	ubuntu@ip-10-0-0-96:~$ sudo file -s /dev/xvdb
	/dev/xvdb: data

If you see a result of data then you can make the filesystem. If you already see a filesystem, you can skip this step. To make a filesystem use the mkfs tool:

	ubuntu@ip-10-0-0-96:~$ sudo mkfs -t ext4 /dev/xvdb
	mke2fs 1.42.9 (4-Feb-2014)
	Filesystem label=
	OS type: Linux
	Block size=4096 (log=2)
	Fragment size=4096 (log=2)
	Stride=0 blocks, Stripe width=0 blocks
	16777216 inodes, 67108864 blocks
	3355443 blocks (5.00%) reserved for the super user
	First data block=0
	Maximum filesystem blocks=4294967296
	2048 block groups
	32768 blocks per group, 32768 fragments per group
	8192 inodes per group
	Superblock backups stored on blocks:
		32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
		4096000, 7962624, 11239424, 20480000, 23887872

	Allocating group tables: done
	Writing inode tables: done
	Creating journal (32768 blocks): done
	Writing superblocks and filesystem accounting information: done

Now you are ready to mount the disk. If you choose to mount the disk to /var/lib/memsql, the default memsql directory, there is no need to further specify a custom data directory. First create an empty /var/lib/memsql directory, then mount the EBS volume to it:

	ubuntu@ip-10-0-0-96:~$ sudo mkdir /var/lib/memsql
	ubuntu@ip-10-0-0-96:~$ sudo mount /dev/xvdb /var/lib/memsql
	ubuntu@ip-10-0-0-96:~$ df -h
	Filesystem      Size  Used Avail Use% Mounted on
	/dev/xvda1      7.8G  780M  6.6G  11% /
	none            4.0K     0  4.0K   0% /sys/fs/cgroup
	udev            7.9G   12K  7.9G   1% /dev
	tmpfs           1.6G  332K  1.6G   1% /run
	none            5.0M     0  5.0M   0% /run/lock
	none            7.9G     0  7.9G   0% /run/shm
	none            100M     0  100M   0% /run/user
	/dev/xvdb       252G   60M  239G   1% /var/lib/memsql

After mounting, you should see in df -h that you have a large amount of space free in /var/lib/memsql, so you can proceed with a normal install.

Info

If your large volume is mounted somewhere other than /var/lib/memsql, go on to the next step to configure a custom install directory

Set a Custom Installation Directory for MemSQL

MemSQL installation directories are configured on a node-by-node basis within MemSQL Ops. To change the default MemSQL Installation directory on your first node, use the --memsql-installs-dir option in install.sh:

sudo ./install.sh --memsql-installs-dir=/mnt/memsql/
Info

This controls were MemSQL will eventually be installed. MemSQL-Ops will still be installed to its default location (/var/lib/memsql-ops).

MemSQL will be deployed to /mnt/memsql on this node. You also need to ensure that it is deployed to the correct directory on all your nodes. To do this, when you are deploying MemSQL-Ops to your other nodes using the graphical web interface, be sure to click the “edit” button before deploying! image

Once you click on edit, you can change the MemSQL Install path for one or many nodes to the location of your large volume: image

Was this article useful?