MemSQL 3.1 Release Notes

Upgrading to 3.1

MemSQL supports various methods for upgrading to version 3.1 depending on your current installation. For more information see: Upgrading MemSQL.

Views

MemSQL 3.1 supports unmaterialized views. Views in MemSQL are not writeable, but do support online ALTER VIEW.

Cross-Datacenter Replication

MemSQL 3.1 supports automated cross-datacenter replication. Previously this had to be done with manual configuration of master and slave replicas. Now it can be done with a single command:

memsql> REPLICATE DATABASE db_name FROM user:'password'@'host':port;

Replication is done on a per-database basis. The topology and redundancy level of the secondary (replica) cluster do not need to match the topology or redundancy level of the primary cluster. Sharding of the replica database on the secondary cluster is managed automatically by MemSQL. Note that the secondary cluster must have access to the public IP addresses of all leaf nodes in the primary cluster. This is because MemSQL replicates data directly from the leaf nodes on the primary cluster to the leaf nodes on the secondary cluster. To get a list of leaf node public IPs, run SHOW LEAVES on the master aggregator of the primary cluster.

Replica databases are read-only. The only supported DML is SELECT and database DDL operations, such as CREATE TABLE and DROP TABLE, are disabled. However, distributed DDL such as REBALANCE PARTITIONS work normally to perform distributed operations on the secondary cluster.

MemSQL offers a number of simple commands for managing replication. Users can pause and restart replication using PAUSE REPLICATING and CONTINUE REPLICATING, respectively. Running STOP REPLICATING will stop replication and convert the replica database to an independent MemSQL database with all standard DDL and DML.

Transactional Multi-INSERT

Single-statement INSERT queries that insert multiple records are now transactional. While attempting to complete the query, if MemSQL encounters a duplicate key, the entire transaction will roll back. Note that transactionality slows multi-INSERT by roughly 20% (depending on network performance). For cases where performance is more important than transactionality, MemSQL 3.1 allows users to disable transactions using INSERT IGNORE. In this case, MemSQL will ignore records with duplicate keys and, without rolling back, continue inserting records with unique keys. For more information see INSERT.

Additional ALTER TABLE Support

MemSQL 3.1 supports ALTER TABLE ... DROP INDEX and ALTER TABLE ... MODIFY COLUMN. Both are online operations. For more information, see ALTER TABLE.

Support for Uncorrelated Subselects with IN Lists

MemSQL 3.1 adds support for uncorrelated subselects with IN lists. For example, the following query will now execute:

memsql> SELECT a.name FROM Table_A a WHERE a.user_id IN (SELECT b.user_id FROM Table_B b WHERE b.age > 23);

Previously queries with this structure could only execute if there was a foreign shard key relationship between the expression preceding IN and the projection of the subquery. Now the above query will execute even if user_id is not a foreign shard key.

MemSQL now also supports the SQL_SMALL_RESULT keyword to be used with IN list uncorrelated subselects. SQL_SMALL_RESULT tells MemSQL to temporarily materialize the IN list, rather than comparing the projection from the outer query with the result of the IN list “natively” without materializing the list. When the result of the subquery defining the IN list has relatively few results, SQL_SMALL_RESULT will allow the query to execute faster. Note that MemSQL always materializes the IN list unless the inner subquery is on a reference table or it can leverage a foreign shard key relationship.

Note

MemSQL uses SQL_SMALL_RESULT differently than MySQL does. In MySQL, SQL_SMALL_RESULT is only used with GROUP BY and DISTINCT queries. MemSQL only uses SQL_SMALL_RESULT with IN lists as described above. For other queries, MemSQL simply ignores SQL_SMALL_RESULT.

Streaming GROUP BY With SQL_BIG_RESULT

For GROUP BY queries, MemSQL 3.1 supports the SQL_BIG_RESULT keyword, which tells MemSQL to do as much sorting as possible on the leaves rather than performing all sorting on the aggregator. SQL_BIG_RESULT can improve performance for SELECT queries with large result sets because it parallelizes the sorting process and can save significant memory on the aggregator. This is the default behavior when the GROUP BY is over indexed columns. For more information see SELECT.

LOAD DATA Improvements

MemSQL 3.1 improves LOAD DATA in a number of ways. In particular, performance is improved thanks to the following changes:

  • Files are streamed throughout the LOAD DATA process rather than buffering the entire dataset in memory
  • File contents are streamed directly to leaf nodes, whereas they used to be encoded as multi-inserts
  • Various low-level optimizations

In addition, LOAD DATA is now transactional. If the query fails, the entire load rolls back. As with multi-inserts, transactionality can be turned off using the IGNORE keyword. For more information see LOAD DATA.

Finally, it is now possible to run multiple concurrent LOAD DATA queries at once.

Manually Triggered Snapshots

MemSQL 3.1 enables users to manually trigger full database snapshots using the SNAPSHOT command on an aggregator. The snapshot process is the same one MemSQL uses for periodic, automatic snapshots. The advantage of manual snapshots is that they allow DBAs to effectively compress log files at will to expedite recovery time. For example, after running several DDL commands, such as ALTER TABLE, replaying log files may take longer than desired since the database will be restored to an earlier state and then has to perform the ALTER again. By triggering a snapshot after performing DDL operations, the latest snapshot will have all the changes and recovery does not require replaying the DDL.

Support for Distributed INSERT ... SELECT

INSERT ... SELECT can now be used with arbitrary distributed tables. If the query can be completed while maintaining the foreign shard key relationship, the process happens entirely on the leaves. Otherwise, the data is streamed to the aggregator. For more information see INSERT.

New ATTACH LEAF Utilities

MemSQL 3.1 enables ATTACH LEAF ALL, a command that attaches all leaves that are currently in the DETACHED state. In addition, clusters configured with redundancy level 1 will automatically reattach leaves unless the user manually detaches them. For more information see ATTACH LEAF.

Table-Level Security

MemSQL 3.1 supports table-level security. User-based permissions are managed using GRANT and REVOKE, just as with database-level security. Table-level permissions can also be applied to views. MemSQL does not currently support row or column-level security. However, an easy way to enforce row or column-level access controls is to create a view that selects particular rows or columns and configure permissions on the view.