6.0 Release Notes
MemSQL 6.0 includes enhancements to query processing with up to 80X performance improvement for group-by/aggregate queries, and broader SQL support. It also introduces extensibility features, and enhanced manageability and resiliency.
MemSQL 6.0 contains the following new capabilities:
MemSQL 6 introduces extensibility features as part of the new MemSQL Procedural SQL (MPSQL) language. MemSQL developers can create user-defined:
- Stored procedures (SPs)
- Scalar-valued functions (UDFs)
- Table-valued functions (TVFs)
- Aggregate functions (UDAFs)
SPs, UDFs, and UDAFs are compiled to machine code for high performance. Array and record types are supported in SPs and UDFs. SPs and UDFs also support exception handling. The MPSQL language will be familiar and straightforward to learn for database developers who have written functions and stored procedures in other database languages.
Query Language Features
The following new query language features are included in the release:
- Window functions with complex frames like “between 5 preceding and current row.” These are useful for moving averages needed for financial applications, electric utilities applications, and more.
- New window functions, including first_value, last_value, nth_value, percentile_disc, percentile_cont
- Cross-database joins and cross-database INSERT … SELECT
- UPDATE and DELETE through join
- UPDATE with subselect in SET clause
- Unenforced unique constraints
- INTERSECT, EXCEPT and MINUS
- Automatic cardinality statistics for columnstore
- Improved cost-based join optimization
- Improved hash join selection
- Improved optimization of UPDATE and DELETE queries
- Improved optimization of LEFT JOIN
- Enhanced join cardinality estimation
Query execution improvements include:
- Enhanced operations on encoded (compressed) data
- Single-Instruction Multiple Data (SIMD) support using AVX-2
- Queries like “select a, count(*) from t group by a” on columnstore tables speed up by as much as 80X
- Best-case performance for group-by/count(*) queries can exceed one billion rows per second per core on the latest processors
- Improvements of up to 30X on queries that:
- Filter on run-length- and dictionary-encoded data
- Group-by on encoded data, with one or more sum/avg/min/max or other aggregates
- Perform hash joins of small dimension tables to fact tables on encoded columns, through improved Bloom filter join acceleration
- More aggressive use of dictionary and run length encoding to benefit query execution
- Improved columnstore update performance
- Improved performance for access to JSON fields in columnstore tables
- Improved columnstore segment elimination based on hash joins
Improvements to the columnstore include:
- Computed columns can be added to columnstore tables via
- The default columnstore segment size has been increased from 102,400 to 1,024,000.
Data loading enhancements include:
- Variable and expression support in LOAD DATA
- Pipelines scheduling improvements
Manageability and Resiliency
Manageability and resiliency has been improved for the 6.0 release, including:
- Enhanced fault tolerance for leaf failover, master aggregator failover, and ALTER TABLE.
- Reduced blocking for backups, ALTER TABLE, leaf failover, and system management operations.
Benefits include fewer situations that require manual intervention, simplified application development, and improved availability.
MemSQL Streamliner was previously deprecated and is removed in MemSQL 6.0. For current Streamliner users, we recommend migrating to MemSQL Pipelines instead. MemSQL Pipelines provides increased stability, improved ingest performance, and exactly-once semantics. For more information about Pipelines, see the MemSQL Pipelines documentation.
What Do I Do With My Old Streamliner Spark Cluster?
To upgrade to MemSQL 6.0, you must first remove each of your Streamliner pipelines, and then uninstall the Spark Cluster co-located with your MemSQL cluster. Once each of the pipelines has been removed, simply run
memsql-ops spark-uninstall to uninstall the Spark Cluster.
Can I Still Use Spark With MemSQL?
Yes, MemSQL supports Spark integration via the MemSQL Spark Connector. The connector allows you to leverage your existing Spark clusters to write data directly to MemSQL via a performant and easy-to-use API. For more information on the MemSQL Spark Connector, please see the MemSQL Spark 2.0 Connector GitHub page.
How Do I Get Started With Pipelines?
For more information on Pipelines, please see:
Maintenance Release Changelog
2018-01-11 Version 6.0.13 (LATEST)
- Fix a compatibility issue which prevented the MariaDB 2.2.0 JDBC driver from connecting to MemSQL
- Fix an issue with auto statistics getting incorrectly refreshed after a node restart when auto stats were enabled with ALTER TABLE
- Reduce the memory used by the merger threads to refresh stale auto statistics
- Keep fewer deleted blob files that are less than 16KB in size in the columnstore replication window to reduce inode usage. The number of files kept is bound by disk usage (2 GB), but for small blob files 2 GB of disk usage could exhaust the inodes on the file system.
- Out of memory handling hardening for
- Enhancements to the columnstore flusher to reduce disk IO
- Include CPU feature flags in the persisted plan cache lookup for plans stored on disk to prevent the plan from being used on processors without the proper features (avx2, etc.). This protects against the underlying hardware MemSQL runs on changing after a node restart which can happen in some cloud environments. If this happens query plans will now recompile.
2017-12-13 Version 6.0.11
- Reduced memory overhead for automatic statistics on columnstore tables
- MemSQL can now handle Lz4 compressed columnstore blob files larger than 4 GB
- Improved out of memory handling during binding and parsing
- The ROWS column in
Information_schema.table_statisticsnow includes rows in the in-memory segment of a columnstore table. It also correctly reports the ROWS for tables with only a hash index.
- Fixed a bug which caused an error when a LOAD DATA that used an ‘@’ symbol in the column list to skip loading a particular column was run on a temporary table
- Added support for the
group_concat_max_lensession variable that controls the maximum result length of
- Reduced occurrences of the error “Partition’s table metadata are out of sync for table”
- Fixed an issue where queries could incorrectly fail due to the max table memory limit being reached
- Fixed an inconsistency issue with backups taken when a multi-statement transaction was committing. The backup might have been able to contain a partial transaction in this case.
INSERT … SELECTqueries against columnstore tables that are pushed down into the leaves now sort the newly inserted rows as specified by the CLUSTERED COLUMNSTORE INDEX before committing. This matches the behaviour of LOAD DATA into columnstore tables.
- Added performance enhancements for how quickly leaf nodes start and stop replication during any clustering operation (REBALANCE PARTITIONS, ATTACH LEAF, failover, etc.)
- Fixed an issue which would cause a LOAD DATA statement to hang if a connection to a leaf used by the LOAD timed out
- Upgraded Xerces library to version 3.2.0
- Upgraded Curl library to version 7.57.0
- Added functionality and stability improvements for query type variables
2017-11-28 Version 6.0.10
- Fixed a problem that caused invalid metadata to be generated by MemSQL 5.X that breaks MemSQL 6.0 on upgrade. (Similar to a fix made in 6.0.8).
- Set the default value for
max_connection_threadsto 192 on aggregators instead of 8192. This is the same default as earlier versions of MemSQL.
2017-11-15 Version 6.0.9
- Fixed bug that caused errors to occur when replicating the cluster internal database on clusters with more than 26 leaf nodes
2017-11-07 Version 6.0.8
- Fixed an issue with synchronously replicated databases where snapshots could sometimes cause a substantial replication slowdown. Asynchronous databases were not affected.
- Fixed an issue where MemSQL Pipelines crashed if an error message contained certain special characters
- Added cross-cluster replication between clusters with more than 63 nodes
- Added an additional check to remove invalid partition metadata upon upgrade to MemSQL 6
- Added an update to disallow using the CALL syntax on built-in functions
- Fixed a bug determining the definer of a UDF or stored procedure in cases where the user who created the UDF or stored procedure was logged in using a grant with a wildcard in it
2017-10-25 Version 6.0.7
- Improved MemSQL Pipelines to support using keywords as column names
- Fixed an issue with MemSQL Pipelines where pipeline metadata was not being copied properly during upgrade