MemSQL 4.0 Release Notes

MemSQL 4.0 includes improved support for distributed joins, geospatial data, and numerous functionality, stability, and performance improvements.

New SQL features

Distributed joins: MemSQL now supports arbitrary distributed joins. it is no longer necessary to shard distributed tables along the same set of columns in order to join them, or to use subqueries to join tables that are not identically sharded. When you execute an SQL query which joins two tables sharded differently, the necessary data will be shuffled between nodes in the cluster. Joining two identically-sharded tables will operate as before with no network overhead. See Distributed Joins for more details.

Geospatial support: MemSQL 4 includes support for two new data types: Geography and GeographyPoint. A Geography object can describe a point on planet Earth (longitude / latitude), a linestring (for example, a jogging path) or a polygon (eg, the borders of a city). The GeographyPoint is optimized for point-only data. MemSQL also supports topological, measurement, and comparison functions for geospatial data, as well as spatial joins. Please see Geospatial Guide to learn more.

Updates and deletes with subqueries: You can now delete or update records using a subselect. For example:

update products set price = price * 0.8
where id in (SELECT id FROM sale_items);

delete products where id in
(SELECT id from discontinued_items);

Nested views: it is now possible to create a view that references other views.

memsql> create view active_items as
 -> select * from items where status = 'active';

memsql> create view discounted_items as
 -> select * from active_items where discount = 1;

Additional SQL support: New data types and functions include DATETIME(6) for microsecond precision, TIME_FORMAT for converting time into formatted strings, TIMEDIFF for calculating intervals of time, and APPROX_COUNT_DISTINCT for calculating quick approximate counts of large data sets.

Multi-statement batch support: Multiple SQL statements can now be executed as part of a single batch. For example:

memsql> delimeter $$
memsql> use database mydb;
memsql> insert into mytable values (1, 'hello');
     -> select * from mytable;
     -> $$
Query OK, 1 row affected (0.00 sec)

+----+-------+
| id | name  |
+----+-------+
|  1 | hello |
+----+-------+
1 row in set (0.01 sec)

Changes to monitoring and installation

MemSQL Ops and the Cluster Installer program have been merged into one and rewritten from the ground up. Installation and configuration of large clusters has been greatly improved. Performance data no longer requires storage in the MemSQL database itself, and common management operations like adding and removing leaf servers can now be done via the user interface.

By default, MemSQL Cluster Installer will now create one partition for each CPU core on the leaf machines in the cluster. If it fails to determine a sensible number, eg because leaves have different numbers of cores, it will fall back to the previous default of 8 partitions per leaf.

Performance improvements

Inferred shard keys: the “foreign shard key” parameter is no longer necessary for maximum performance when joining tables. Starting with MemSQL 4, if any two distributed tables have identical shard key signatures they will be joined locally on the leaves automatically. No explicit reference to the “foreign” table is needed in table definition. For example:

create table subspecies (
  id int not null,
  species_id int not null,
  name varchar(64),
  FOREIGN SHARD KEY (species_id) REFERENCES species (id)
);

Can now now be written simply as

create table subspecies (
  id int not null,
  species_id int not null,
  name varchar(64),
  SHARD KEY (species_id)
);

Operational improvements

Columnstore replication and failover: You can now create columnstore tables with replication and high-availability.

Auto-attach leaves: In previous versions of MemSQL, leaf recovery (for example, after a machine restart) required a manual ATTACH LEAF operation. With MemSQL 4, recovering leaves are automatically detected and reattached.

More operational data: Enhancements have been made to tables in the information_schema database, making it even easier to analyze the structure of databases, query performance, and space consumption.

There have been numerous improvements to the speed of failing over nodes, database synchronization, data replication, and general stability.

Breaking changes

Columnstore Format: the on-disk format of Columnstore tables was changed in 4.0. To upgrade a cluster from 3.2 or earlier to MemSQL 4, please contact support@memsql.com.

NULL-handling in LOAD DATA: In previous versions of MemSQL, inserting the literal string NULL the via LOAD DATA command would produce a null value, unless it was enclosed in quotes. In version 4 this behavior was changed to match MySQL’s semantics. See the documentation for LOAD DATA for more details.

Maintenance Release Changelog

2015-08-20 Version 4.0.4

  • Fixed small memory leak when connection is created (D11888)
  • Fixed an issue with insert ... select into timstamp columns (D11698)

2015-08-04 Version 4.0.3

  • Randomize keepalive timers to fix some of the connection drop errors when clusters are under heavier load (D11665)
  • only set TCP options on INET sockets (D11574)
  • Fix show leaf status (D11661)
  • Add several mysql system variables for MySQL JDBC driver compatibility (D11608)

2015-07-21 Version 4.0.2

  • Reshuffle INSERT... SELECT... with multi-column shard key returns skewed result (D11498)
  • Set max_connection_threads to 128 on aggregators (D11496)
  • Fix stuck columnstore recovery (D11458)
  • Fix derived table shard keys being out of order (D11433)
  • Create datadir if it doesn’t exist (D11416)

2015-07-07 Version 4.0.1

  • RESTORE-ing on top of a database which already exists corrupts the database (D11328)
  • Add support for CREATE RESULT TABLE .. PARTITION BY ... REFERENCES ... which forces the type of the columns in the partition by to be the types from the references clause. (D11280)
  • Allow insert .. select with timestamp columns to be pushed down (D11256)
  • Improved failover speed during a network partition
  • Remove service script dependency on mutable filesystem (D11246)
  • Output the right number of columns on the aggregators if we get too many from leaf (D11245)
  • Default the pidfile into the datadir rather than the root (D11236)
  • Added flags to the service script to make it usable in a readonly env (D11230)
  • Fixed assert in ReorderLeftJoins in correlated subselect
  • Don’t release rowlocks mid-upsert (D10831)
  • Fix reprovisioning sharding database problem
  • Added rewrite to reorder inner joins before left joins (D10505)
  • Miscellaneous load data bugs
  • Fix SET protocol statement (D11147)
  • Fix multiple issues with expressions in upserts (D11136)