There are three types of tables in MemSQL: reference tables, sharded tables, and temporary tables.
Reference tables are relatively small tables that do not need to be distributed and are present on every node in the cluster. They are both created and written to on the master aggregator. Reference tables are implemented via primary-secondary replication to every node in the cluster from the master aggregator. Replication enables reference tables to be dynamic: updates that you perform to a reference table on the master aggregator are quickly reflected on every machine in the cluster.
MemSQL aggregators can take advantage of reference tables’ ubiquity by pushing joins between reference tables and a distributed table onto the leaves. Imagine you have a distributed
clicks table storing billions of records and a smaller
customers table with just a few million records. Since the
customers table is small, it can be replicated on every node in the cluster. If you run a join between the
clicks table and the
customers table, then the bulk of the work for the join will occur on the leaves.
Reference tables are a convenient way to implement dimension tables.
Every database in the distributed system is split into a number of partitions. Each sharded table created is split with hash partitioning over the table’s primary key; a portion of the table is stored in each of its database’s partitions.
Partitions are implemented as databases on the leaves. For example, partition 3 of database
db is stored in a database
db_3 on one of the leaves. For every sharded table created inside
db, a portion of its data will reside in
db_3 on that leaf.
Although sharded tables in the same database share the same database containers on the leaves, no assumptions can be made about particular rows from different tables being co-located on a partition. If you join two tables on the column(s) they are sharded on, MemSQL can perform a co-located join, which will improve the speed of the join.
Temporary tables exist, storing data in memory, for the duration of a client session. This means they are scoped to the connection that opened them, cannot be queried by other users, and are dropped once the connection has ended. They can also be dropped manually without removing the connection.
MemSQL does not write logs or take snapshots of temporary tables. Temporary tables are designed for temporary, intermediate computations and can only be created in rowstores. Since temporary tables are not persisted in MemSQL, they have high availability disabled. This means that if a node in a cluster goes down and a failover occurs, all the temporary tables on the cluster lose data. Whenever a query references a temporary table after a node failover, it returns an error. For example,
"Temporary table <table> is missing on leaf <host> due to failover. The table will need to be dropped and recreated."
To prevent loss of data on node failover, use MemSQL tables that have high availability enabled.
Views cannot reference temporary tables because temporary tables only exist for the duration of a client session. Although MemSQL does not materialize views, views are available as a service for all clients, and so cannot depend on client session-specific temporary tables.
Unlike CREATE TABLE without the
CREATE TEMPORARY TABLE can be run on any aggregator, not just the master aggregator. Temporary tables are sharded tables, and can be modified and queried like any “permanent” table, including distributed joins.
Global Temporary Tables
Another type of temporary table is the global temporary table. Like temporary tables, these can only be created in rowstores, and are not persisted.
Unlike temporary tables, global temporary tables exist beyond the duration of a session. They are never automatically dropped and must always be dropped manually. They can also be queried by other users since they are not session dependent.
If failover occurs, global temporary tables lose data and enter an errored state; they need to be dropped and recreated. However, dropping a global or non-global temporary table does not drop its plancache from the disk and retains the cache if the table is recreated with the same schema.
CREATE GLOBAL TEMPORARY TABLE command can be run only on the master aggregator. However, the DML commands for global temporary tables can be run from both the master aggregator and child aggregator.
Note: Both global temporary tables and non-global temporary tables cannot be altered.
- CREATE TABLE
- SHOW TABLES
- SHOW COLUMNS
- ALTER TABLE
- SHOW TABLE STATUS
- FLUSH TABLES
- UNLOCK TABLES
- DROP TABLE