The following components are fundamental to a MemSQL cluster. They are also key terms that are used throughout the documentation.
A node is a server that has an installation of an MemSQL instance.
A leaf is a node that stores a subset of a cluster’s data. A cluster typically contains many leaves.
A partition contains a subset (a shard) of a database’s data. A leaf contains multiple partitions. When you run CREATE DATABASE, MemSQL splits the database into partitions, which are distributed evenly among available leaves. With
CREATE DATABASE, you can specify the number of partitions with the
PARTITIONS=X option. If you don’t specify the number of partitions explicitly, the default is used (the number of leaves times the value of the
default_partitions_per_leaf engine variable.
Types of Tables
A distributed table’s data is distributed across partitions. By contrast, a reference table’s data is replicated, in full, to all nodes in a cluster.
A temporary table is distributed, but is not persisted.
An aggregator is a node that routes queries to the leaves, aggregates intermediate the results, and sends the results back to the client. There are two types of aggregators: master and child. A cluster contains exactly one master aggregator and may contain zero or more child aggregators (depending on query volume).
A master aggregator is a specialized aggregator responsible for cluster monitoring and failover.
Except for DDL operations and writes to reference tables, which must go through the master aggregator, any query can be run against any aggregator.
Aggregator to Leaf Ratio
The minimal setup for a MemSQL cluster is just one aggregator (the master aggregator) and one leaf. You can add more aggregators, which will read metadata from the master aggregator, and can run DML commands on the leaves.
The number of deployed aggregator and leaf nodes determines the storage size and performance of a cluster. Typical deployments have a 5:1 ratio of leaf:aggregator nodes. In a well-designed cluster:
- Applications that require higher connection capabilities from application servers have a higher aggregator-to-leaf node ratio.
- Applications with larger storage requirements have a higher leaf-to-aggregator node ratio.
Interaction of Cluster Components
You run database queries against an aggregator. Most of the communication between aggregators and leaves for query execution is also implemented as SQL statements.
Parallel Execution of Queries
By default, MemSQL will create one partition per CPU core on the leaves for maximum parallelism. This number is configurable cluster-wide with the
default-partitions-per-leaf variable, or as an optional parameter to CREATE DATABASE.
In the context of query execution, a partition is the granular unit of query parallelism. In other words, every parallel query is run with a level of parallelism equal to the number of partitions. You can view the partitions in a database with SHOW PARTITIONS.