System Requirements and Recommendations min read


Info

This topic does not apply to MemSQL Helios.

The following are some requirements and recommendations you should follow when provisioning and setting up your host machines to optimize the performance of your cluster.

With the exception of the hardware and software requirements, all other settings are optional.

Hardware and software requirements

Each MemSQL node requires a host machine with at least four CPU cores and eight GB of RAM available per node.

When provisioning your host machines, the minimum Linux kernel version required is 2.6.32; however, for performance reasons MemSQL should be run on kernel 3.10 or later.

Our recommended platforms are the following:

  • RHEL/CentOS 6 or 7 (version 7 is preferred)
  • Debian 8 or 9 (version 9 is preferred)

For cloud deployments, all instances should be geographically deployed in a single region. Instance types that support Enhanced Networking should have it (or a similar feature) enabled.

Network port settings

Depending on the host machine and its function in deployment, some or all of the following port settings should be enabled on machines in your cluster. These routing and firewall settings must be configured to:

  • Allow database clients (e.g. your application) to connect to the MemSQL aggregators
  • Allow all nodes in the cluster to talk to each other over the MemSQL protocol (3306)
  • Allow you to connect to management and monitoring tools
Protocol Port Direction Description
TCP 3306 Inbound and Outbound Default port used by MemSQL. Required on all nodes for intra-cluster communication. Also required on aggregators for client connections.
TCP 22 Inbound and Outbound For host machine access. Required between nodes in MemSQL tool deployment scenarios. Also useful for remote administration and troubleshooting on the main deployment machine.
TCP 443 Outbound To get public repo key for package verification. Required for nodes downloading MemSQL APT or YUM packages.
TCP 8080 Inbound and Outbound Default port for MemSQL Studio. (Only required for the host machine running Studio.)

The service port values are configurable if the default values cannot be used in your deployment environment. For more information on how to change them, see the MemSQL configuration file, memsql-toolbox-config register-host command, and MemSQL Studio Installation Guide topics.

We also highly recommend configuring your firewall to prevent other hosts on the Internet from connecting to MemSQL.

Hardware recommendations

The following are additional hardware recommendations for optimal performance:

Component Recommendation
CPU 8 vCPU per host machine.
Memory At least 4GB per core, 32GB minimum per leaf node.
Storage Provide a storage system for each node with at least 3 times the capacity of main memory. SSD storage is recommended for columnstore workloads.

Here are some considerations when deciding on your hardware:

  • MemSQL rowstore storage capacity is limited by the amount of RAM on the host machine. Increasing RAM increases the amount of available data storage.
  • It is strongly recommended to run MemSQL leaf nodes on machines that have the same hardware and software specifications.
  • MemSQL is optimized for architectures supporting SSE4.2 and AVX2 instruction set extensions, but it will run successfully on x64 systems without these extensions. See our AVX2 verification topic for more information on how to verify if your system supports AVX2.
  • For concurrent loads on columnstore tables, SSD storage will improve performance significantly compared to HDD storage.

Enabling Cluster-on-Die (if supported)

If you are installing MemSQL natively and have access to the BIOS, you should enable Cluster-on-Die in the system BIOS for machines with Haswell-EP and later CPUs. When enabled, this will result in multiple NUMA regions being exposed per processor. MemSQL can take advantage of NUMA nodes by binding specific MemSQL nodes to those NUMA nodes, which in turn will result in higher MemSQL performance.

Software recommendations

In addition to these basic OS requirements, it is helpful to configure the underlying Linux OS in the following areas to get the most performance using MemSQL.

These tuning instructions should be done on each host machine in your cluster.

Configure Linux vm settings

MemSQL recommends letting first-party tools, such as memsql-admin, memsqlctl, or MemSQL Ops, configure your vm settings to minimize the likelihood of getting memory errors on your host machines. The default values used by the tools are the following:

  • vm.max_map_count is set to 1000000000
  • vm.min_free_kbytes is set to either 1% of system RAM or 4 GB, whichever is smaller

If the MemSQL tools cannot set the values for you, you will get an error message stating what the value should be and how to set it. You can set the values manually using the /sbin/sysctl command, as shown below.

sudo sysctl -w vm.max_map_count=1000000000
sudo sysctl -w vm.min_free_kbytes=658096

Enabling NUMA support

If the CPU(s) on your host machines supports Non-Uniform Memory Access (NUMA), MemSQL can take advantage of that and bind MemSQL nodes to NUMA nodes. Binding MemSQL nodes to NUMA nodes allows faster access to in-memory data since individual MemSQL nodes only access data that’s collocated with their corresponding CPU. If you do not configure MemSQL this way, performance will be greatly degraded due to expensive cross-NUMA-node memory access. Configuring for NUMA should be done as part of the installation process; however, you can reconfigure your deployment later, if necessary.

The MemSQL tools can do the NUMA binding for you; however, you must have numactl installed first. Perform the following steps on each host machine:

  1. Log into each host and install the numactl package. For example, for Debian-based OSes:

    sudo apt-get install numactl
    

    For Redhat/CentOS, run the following:

    sudo yum install numactl
    
  2. Check the number of NUMA nodes your machines by running numactl --hardware. For example:

    numactl --hardware
    available: 2 nodes (0-1)
    

    The output shows that there are 2 NUMA nodes on this machine, numbered 0 and 1.

For additional information, see Configuring for Non-Uniform Memory Access.

Disable Transparent Huge Pages

MemSQL recommends disabling transparent huge pages (THP) at boot time on all nodes (master aggregator, child aggregators, and leaves) in the cluster. Not doing so may result in inconsistent query run times.

To disable THP, add the following lines to the end of rc.local before the exit line (if present), and reboot the host machines.

For Debian-based distributions, add the following to /etc/rc.d/rc.local:

echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag
echo 0 > /sys/kernel/mm/transparent_hugepage/khugepaged/defrag
echo no > /sys/kernel/mm/transparent_hugepage/khugepaged/defrag

For RedHat/CentOS distributions, add the following to /etc/rc.local:

echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled
echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag
echo 0 > /sys/kernel/mm/redhat_transparent_hugepage/khugepaged/defrag
echo no > /sys/kernel/mm/redhat_transparent_hugepage/khugepaged/defrag
Warning

The /etc/rc.local file points to /etc/rc.d/rc.local. And beginning with Red Hat Enterprise Linux 7 version, /etc/rc.d/rc.local is not executable by default for users. To add execute permissions, run the following command:

  chmod +x /etc/rc.d/rc.local
Info

The khugepaged/defrag option will be 1 or 0 on newer Linux versions (e.g. CentOS 7+), but yes or no on older versions (e.g. CentOS 6). You can check which your system uses by running cat /sys/kernel/mm/*transparent_hugepage/khugepaged/defrag and keep only the matching setting. For example, if you see 1 or 0 update the line with echo 0. If you see yes or no, update the line with echo no.

Install and run Network Time Protocol service

Install and run ntpd to ensure that system time is in sync across all nodes in the cluster.

For Debian-based distributions like Ubuntu:

sudo apt-get install ntpd

For RedHat/CentOS distributions:

sudo yum install ntp

Recommendations for Optimal On-Premise Columnstore Performance

We support the EXT4 and XFS filesystems. Also, many improvements have been made recently in Linux for NVMe devices, so we recommend using a 3.0+ series kernel. For example, CentOS 7.2 uses the 3.10 kernel.

If you use NVMe drives, set the following parameters in Linux (make it permanent in /etc/rc.local):

# Set ${DEVICE_NUMBER} for each device
echo 0 > /sys/block/nvme${DEVICE_NUMBER}n1/queue/add_random
echo 1 > /sys/block/nvme${DEVICE_NUMBER}n1/queue/rq_affinity
echo none > /sys/block/nvme${DEVICE_NUMBER}n1/queue/scheduler
echo 1023 > /sys/block/nvme${DEVICE_NUMBER}n1/queue/nr_requests

Increase File Descriptor and Maximum Process Limits

A MemSQL cluster uses a substantial number of client and server connections between aggregators and leaves to run queries and cluster operations. We recommend setting the Linux file descriptor and maximum process limits to the values listed below to account for these connections. Failing to increase this limit can significantly degrade performance and even cause connection limit errors. The ulimit settings can be configured in the /etc/security/limits.conf file, or directly via shell commands.

Permanently increase the open files limit and the max user processes limit for the memsql user by editing the /etc/security/limits.conf file as the root user and adding the following lines:

memsql    soft    NOFILE    1000000
memsql    hard    NOFILE    1000000
memsql    soft    nproc     128000
memsql    hard    nproc     128000

Configure Linux ulimit settings

Most Linux operating systems provide ways to control the usage of system resources such as threads, files and network at an individual user or process level. The per-user limitations for resources are called ulimits, and they prevent single users from consuming too much system resources. For optimal performance, MemSQL recommends setting ulimits to higher values than the default Linux settings. The ulimit settings can be configured in the /etc/security/limits.conf file, or directly via shell commands.

Create swap space

It is recommended that you create a swap partition (or swap file on a dedicated device) to serve as an emergency backing store for RAM. MemSQL makes extensive use of RAM (especially with rowstore tables), so it is important that the operating system does not immediately start killing processes if MemSQL runs out of memory. Because typical machines running MemSQL have a large amount of RAM (>32 GB/node), the swap space can be small (<10% of physical RAM).

For more information setting up and configuring swap space, please refer to your distribution’s documentation.

After enabling these settings, your machines will be configured for optimal performance when running one or more MemSQL nodes.

Was this article useful?