System Requirements and Recommendations

The following are some requirements and recommendations you should follow when provisioning and setting up your host machines to optimize the performance of your cluster.

With the exception of the hardware and software requirements, all other settings are optional.

Hardware and software requirements

Each MemSQL node requires a host machine with at least four CPU cores and eight GB of RAM available per node.

When provisioning your host machines, the minimum Linux kernel version required is 2.6.32; however, for performance reasons MemSQL should be run on kernel 3.10 or later.

Our recommended platforms are the following:

  • RHEL/CentOS 6 or 7 (version 7 is preferred)
  • Debian 8 or 9 (version 9 is preferred)

The steps in this tutorial have been validated with CentOS 6.10 and 7.5 and with Debian 9.

For cloud deployments, all instances should be geographically deployed in a single region. Instance types that support Enhanced Networking should have it (or a similar feature) enabled.

Network port settings

Depending on the host machine and its function in deployment, some or all of the following port settings should be enabled on machines in your cluster.

Protocol Port Direction Description
TCP 3306 Inbound and Outbound Default port used by MemSQL. Required for all nodes.
TCP 22 Inbound and Outbound For host machine access. Required for all nodes in MemSQL tool deployment scenarios.
TCP 80 Outbound To access package repos. Required for nodes downloading MemSQL APT or YUM packages.
TCP 443 Outbound To get public repo key for package verification. Required for nodes downloading MemSQL APT or YUM packages.
TCP 8080 Inbound and Outbound Default value for MemSQL Studio. (Only required for the machine that will host Studio.)
TCP 9000 Inbound and Outbound Default port for MemSQL Ops. (Only required if your cluster is using Ops.)

In this tutorial, you will use one of your host machines to deploy MemSQL across your host machines. This machine must have all of the above network settings enabled. The other host machines only require ports 3306 and 22 to be open.

Also, MemSQL, Studio, and Ops ports are configurable if the default values cannot be used in your deployment environment.

Hardware recommendations

The following are additional hardware recommendations for optimal performance:

Component Recommendation
CPU Two socket x64-based server with at least 4 total cores per socket.
Memory At least 4GB per core, 32GB minimum per node.
Storage Provide a storage system for each node with at least 3 times the capacity of main memory. SSD storage is recommended for columnstore workloads.

Here are some considerations when deciding on your hardware:

  • MemSQL rowstore storage capacity is limited by the amount of RAM on the host machine. Increasing RAM increases the amount of available data storage.
  • It is strongly recommended to run MemSQL leaf nodes on machines that have the same hardware and software specifications.
  • MemSQL is optimized for architectures supporting SSE4.2 and AVX2 instruction set extensions, but it will run successfully on x64 systems without these extensions. See our AVX2 verification topic for more information on how to verify if your system supports AVX2.
  • For concurrent loads on columnstore tables, SSD storage will improve performance significantly compared to HDD storage.

Enabling Cluster-on-Die (if supported)

If you are installing MemSQL natively and have access to the BIOS, you should enable Cluster-on-Die in the system BIOS for machines with Haswell-EP and later CPUs. When enabled, this will result in multiple NUMA regions being exposed per processor. MemSQL can take advantage of NUMA nodes by binding specific MemSQL nodes to those NUMA nodes, which in turn will result in higher MemSQL performance.

Software recommendations

In addition to these basic OS requirements, it is helpful to configure the underlying Linux OS in the following areas to get the most performance using MemSQL.

These tuning instructions should be done on each host machine in your cluster.

Enabling NUMA support

If the CPU(s) on your host machines supports Non-Uniform Memory Access (NUMA), MemSQL can take advantage of that and bind MemSQL nodes to NUMA nodes. Binding MemSQL nodes to NUMA nodes allows faster access to in-memory data since individual MemSQL nodes only access data that’s collocated with their corresponding CPU. If you do not configure MemSQL this way, performance will be greatly degraded due to expensive cross-NUMA-node memory access. Configuring for NUMA should be done as part of the installation process; however, you can reconfigure your deployment later, if necessary.

The MemSQL tools can do the NUMA binding for you; however, you must have numactl installed first. Perform the following steps on each host machine:

  1. Log into each host and install the numactl package. For example, for Debian-based OSes:

    sudo apt-get install numactl

    For Redhat/CentOS, run the following:

    sudo yum install numactl
  2. Check the number of NUMA nodes your machines by running numactl --hardware. For example:

    numactl --hardware
    available: 2 nodes (0-1)

    The output shows that there are 2 NUMA nodes on this machine, numbered 0 and 1.

For additional information, see Configuring for Non-Uniform Memory Access.

Disable Transparent Huge Pages

MemSQL recommends disabling transparent huge pages (THP) at boot time on all nodes (master aggregator, child aggregators, and leaves) in the cluster. Not doing so may result in inconsistent query run times.

To disable THP, add the following lines to the end of rc.local before the exit line (if present), and reboot the host machines.

For Debian-based distributions, add the following to /etc/rc.d/rc.local:

echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag
echo 0 > /sys/kernel/mm/transparent_hugepage/khugepaged/defrag
echo no > /sys/kernel/mm/transparent_hugepage/khugepaged/defrag

For RedHat/CentOS distributions, add the following to /etc/rc.local:

echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled
echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag
echo 0 > /sys/kernel/mm/redhat_transparent_hugepage/khugepaged/defrag
echo no > /sys/kernel/mm/redhat_transparent_hugepage/khugepaged/defrag

The khugepaged/defrag option will be 1 or 0 on newer Linux versions (e.g. CentOS 7+), but yes or no on older versions (e.g. CentOS 6). You can check which your system uses by running cat /sys/kernel/mm/*transparent_hugepage/khugepaged/defrag and keep only the matching setting. For example, if you see 1 or 0 update the line with echo 0. If you see yes or no, update the line with echo no.

Install and run Network Time Protocol service

Install and run ntpd to ensure that system time is in sync across all nodes in the cluster.

For Debian-based distributions like Ubuntu:

sudo apt-get install ntpd

For RedHat/CentOS distributions:

sudo yum install ntp

Recommendations for Optimal On-Premise Columnstore Performance

We support the EXT4 and XFS filesystems. Also, many improvements have been made recently in Linux for NVMe devices, so we recommend using a 3.0+ series kernel. For example, CentOS 7.2 uses the 3.10 kernel.

If you use NVMe drives, set the following parameters in Linux (make it permanent in /etc/rc.local):

# Set ${DEVICE_NUMBER} for each device
echo 0 > /sys/block/nvme${DEVICE_NUMBER}n1/queue/add_random
echo 1 > /sys/block/nvme${DEVICE_NUMBER}n1/queue/rq_affinity
echo none > /sys/block/nvme${DEVICE_NUMBER}n1/queue/scheduler
echo 1023 > /sys/block/nvme${DEVICE_NUMBER}n1/queue/nr_requests

Increase File Descriptor and Maximum Process Limits

A MemSQL cluster uses a substantial number of client and server connections between aggregators and leaves to run queries and cluster operations. We recommend setting the Linux file descriptor and maximum process limits to the values listed below to account for these connections. Failing to increase this limit can significantly degrade performance and even cause connection limit errors. The ulimit settings can be configured in the /etc/security/limits.conf file, or directly via shell commands.

Permanently increase the open files limit and the max user processes limit for the memsql user by editing the /etc/security/limits.conf file as the root user and adding the following lines:

memsql    soft    NOFILE    1000000
memsql    hard    NOFILE    1000000
memsql    soft    nproc     128000
memsql    hard    nproc     128000

Configure Linux ulimit settings

Most Linux operating systems provide ways to control the usage of system resources such as threads, files and network at an individual user or process level. The per-user limitations for resources are called ulimits, and they prevent single users from consuming too much system resources. For optimal performance, MemSQL recommends setting ulimits to higher values than the default Linux settings. The ulimit settings can be configured in the /etc/security/limits.conf file, or directly via shell commands.

Create swap space

It is recommended that you create a swap partition (or swap file on a dedicated device) to serve as an emergency backing store for RAM. MemSQL makes extensive use of RAM (especially with rowstore tables), so it is important that the operating system does not immediately start killing processes if MemSQL runs out of memory. Because typical machines running MemSQL have a large amount of RAM (>32 GB/node), the swap space can be small (<10% of physical RAM).

For more information setting up and configuring swap space, please refer to your distribution’s documentation.

After enabling these settings, your machines will be configured for optimal performance when running one or more MemSQL nodes.

Network Requirements

MemSQL requires routing and firewall settings to be configured to:

  • allow database clients (e.g. your application) to connect to the MemSQL aggregators
  • allow all nodes in the cluster to talk to each other over the MemSQL protocol (3306) and MemSQL Agent protocol (9000). Note that these port assignments are configurable; for more information on how to change them, see the MemSQL Configuration and the AGENT-START topics.
Port Source Destination
3306 Clients Aggregator Nodes
3306 All MemSQL Nodes All MemSQL Nodes
9000 All MemSQL Nodes All MemSQL Nodes

We also highly recommend configuring your firewall to prevent other hosts on the Internet from connecting to MemSQL.

Was this article useful?