6.0 Release Notes
MemSQL 6.0 includes enhancements to query processing with up to 80X performance improvement for group-by/aggregate queries, and broader SQL support. It also introduces extensibility features, and enhanced manageability and resiliency.
MemSQL 6.0 contains the following new capabilities:
MemSQL 6 introduces extensibility features as part of the new MemSQL Procedural SQL (MPSQL) language. MemSQL developers can create user-defined:
- Stored procedures (SPs)
- Scalar-valued functions (UDFs)
- Table-valued functions (TVFs)
- Aggregate functions (UDAFs)
SPs, UDFs, and UDAFs are compiled to machine code for high performance. Array and record types are supported in SPs and UDFs. SPs and UDFs also support exception handling. The MPSQL language will be familiar and straightforward to learn for database developers who have written functions and stored procedures in other database languages.
Query Language Features
The following new query language features are included in the release:
- Window functions with complex frames like “between 5 preceding and current row.” These are useful for moving averages needed for financial applications, electric utilities applications, and more.
- New window functions, including first_value, last_value, nth_value, percentile_disc, percentile_cont
- Cross-database joins and cross-database INSERT … SELECT
- UPDATE and DELETE through join
- UPDATE with subselect in SET clause
- Unenforced unique constraints
- INTERSECT, EXCEPT and MINUS
- Automatic cardinality statistics for columnstore
- Improved cost-based join optimization
- Improved hash join selection
- Improved optimization of UPDATE and DELETE queries
- Improved optimization of LEFT JOIN
- Enhanced join cardinality estimation
Query execution improvements include:
- Enhanced operations on encoded (compressed) data
- Single-Instruction Multiple Data (SIMD) support using AVX-2
- Queries like “select a, count(*) from t group by a” on columnstore tables speed up by as much as 80X
- Best-case performance for group-by/count(*) queries can exceed one billion rows per second per core on the latest processors
- Improvements of up to 30X on queries that:
- Filter on run-length- and dictionary-encoded data
- Group-by on encoded data, with one or more sum/avg/min/max or other aggregates
- Perform hash joins of small dimension tables to fact tables on encoded columns, through improved Bloom filter join acceleration
- More aggressive use of dictionary and run length encoding to benefit query execution
- Improved columnstore update performance
- Improved performance for access to JSON fields in columnstore tables
- Improved columnstore segment elimination based on hash joins
Improvements to the columnstore include:
- Computed columns can be added to columnstore tables via
- The default columnstore segment size has been increased from 102,400 to 1,024,000.
Data loading enhancements include:
- Variable and expression support in LOAD DATA
- Pipelines scheduling improvements
Manageability and Resiliency
Manageability and resiliency has been improved for the 6.0 release, including:
- Enhanced fault tolerance for leaf failover, master aggregator failover, and ALTER TABLE.
- Reduced blocking for backups, ALTER TABLE, leaf failover, and system management operations.
Benefits include fewer situations that require manual intervention, simplified application development, and improved availability.
MemSQL Streamliner was previously deprecated and is removed in MemSQL 6.0. For current Streamliner users, we recommend migrating to MemSQL Pipelines instead. MemSQL Pipelines provides increased stability, improved ingest performance, and exactly-once semantics. For more information about Pipelines, see the MemSQL Pipelines documentation.
What Do I Do With My Old Streamliner Spark Cluster?
To upgrade to MemSQL 6.0, you must first remove each of your Streamliner pipelines, and then uninstall the Spark Cluster co-located with your MemSQL cluster. Once each of the pipelines has been removed, simply run
memsql-ops spark-uninstall to uninstall the Spark Cluster.
Can I Still Use Spark With MemSQL?
Yes, MemSQL supports Spark integration via the MemSQL Spark Connector. The connector allows you to leverage your existing Spark clusters to write data directly to MemSQL via a performant and easy-to-use API. For more information on the MemSQL Spark Connector, please see the MemSQL Spark 2.0 Connector GitHub page.
How Do I Get Started With Pipelines?
For more information on Pipelines, please see:
Maintenance Release Changelog
2018-04-02 Version 6.0.20
- Added fix to avoid creating columnstore JSON segments which are too large for gzip compressions to handle (larger than 2 GB).
- Fixed a rare crash when decoding a columnstore JSON column.
- Fixed an issue where a multi-statement write transaction committing during a
BACKUPhad a small chance of creating an inconsistent backup. The backup might have been able to see only part of the transaction in rare cases.
- Fixed a performance issue where queries ran inside of stored procedures would do extra unnecessary type conversions causing performance regressions compared to running the same query outside of a stored procedure.
- Fixed a deadlock when running an
ALTER TABLE MODIFY COLUMNagainst a table with computed columns.
- Fixed an issue with upgrading to MemSQL 6 for tables that have a column named “flags”. In rare cases, there can be a collision with a column named
flagsin an internal metadata table, which causes recovery to fail after upgrading to MemSQL 6.
2018-03-12 Version 6.0.18
- Now allow
SETstatements that use columns from the shard key if the
SETstatement doesn’t change the value of the shard key columns (i.e. it sets the shard key columns to their existing values). Previously,
SETstatements that included the shard key columns were disallowed regardless of the value the columns were set to.
- The memory used by distributed joins for temporary processing is no longer limited by the setting of
maximum_table_memory. It is now limited by
maximum_memorylike other query processing memory usage.
FORCE_INTERPRETER_MODEquery hint is now forwarded to the leaves so that queries run using the interpreter across the entire cluster instead of using the interpreter only on the aggregator.
- Distributed joins with correlated sub-selects would cause the server to crash during code-generations in some edge cases.
- Fix extraneous reprovisions of the original master partition during
REBALANCEpartition that moves the master to a different node in the cluster. The issue may occur if the columnstore merger or flusher is very active during the rebalance operation.
2018-03-02 Version 6.0.17
- Memory is now cached more aggressively in the buffer manager. MemSQL 6.0.15 introduced a change which caused MemSQL to stop caching memory if
Total_server_memorygrew to more than 90% of
maximum_memory. Due to performance issues allocating memory on older linux kernels (2.6), we now always allow 1% of
maximum_memoryto be cached even if memory use is high. We also allow caching until
Total_server_memoryis within 90% of
Total_server_memoryis greater than
maximum_memoryminus 10 GB (to maintain a ceiling of 10 GB of remaining RAM). The latter behavior is for machines with more physical memory where stopping caching at 10% of remaining
maximum_memoryis too conservative.
- Allow MPSQL variables to be used as arguments to Table Valued Functions that are called within Stored Procedures.
- Fix crash that can occur when CALLing or ECHOing a Stored Procedure with arguments that are a different type than the type of the arguments used when the statement was compiled.
- Allow the return type of
VOIDto be explicitly specified when creating a Stored Procedure.
2018-02-16 Version 6.0.16
- Fixed an issue which causes the columnstore background merger to block a
DROP TABLEoperation for an extended period of time.
- Fixed a crash when restarting an aggregator with more than 256 users created.
2018-02-12 Version 6.0.15
- Improved performance of queries which project duplicate columns and require no processing on the aggregator.
- Fixed a crash when
OPTIMIZER_STATISTICSinformation_schema table is queried concurrently with
- Improved performance of
SHOW TABLE STATUSwhen there are many columnstore tables.
- Fixed an issue with the connection scheduler causing unnecessary context switch misses on rare occasions.
- Cache memory in the buffer manager less aggressively. The new behavior is to not cache if
Total_server_memoryuse is within 90% of
- Fixed a crash caused by
SHOWcommands with long
- Allow DDL commands to run concurrently with
RESTOREand allow concurrent RESTORES of different databases.
- Fix ‘table does not exist’ error when attempting to grant extensibility permissions to roles or groups.
- Fixed issue where table-valued functions (TVFs) with the same body but different parameter specifications would incorrectly use the same plan cache entry.
- Fix issue with cardinality estimation for comparisons of some binary columns to string constants, and some string columns to binary constants.
2018-01-29 Version 6.0.14
- Fix a crash during
RESTOREdatabase if an
ALTER TABLEquery had been run on a reference table prior to creating the backup.
- Reduce memory fragmentation in the variable allocator caused by the garbage collector looking for completely empty pages of memory.
- Fix a performance regression in singleton insert workloads introduced in MemSQL 6.
- Improves columnstore scan performance on many-core machines when using the
GROUP BYclause in single-table aggregate queries.
- Fix a crash that could occur when optimizing queries with multiple window functions.
- Improved join optimization for queries with constant filters on hash indexes.
2018-01-11 Version 6.0.13
- Fix a compatibility issue which prevented the MariaDB 2.2.0 JDBC driver from connecting to MemSQL
- Fix an issue with auto statistics getting incorrectly refreshed after a node restart when auto stats were enabled with ALTER TABLE
- Reduce the memory used by the merger threads to refresh stale auto statistics
- Keep fewer deleted blob files that are less than 16KB in size in the columnstore replication window to reduce inode usage. The number of files kept is bound by disk usage (2 GB), but for small blob files 2 GB of disk usage could exhaust the inodes on the file system.
- Out of memory handling hardening for
- Enhancements to the columnstore flusher to reduce disk IO
- Include CPU feature flags in the persisted plan cache lookup for plans stored on disk to prevent the plan from being used on processors without the proper features (avx2, etc.). This protects against the underlying hardware MemSQL runs on changing after a node restart which can happen in some cloud environments. If this happens query plans will now recompile.
2017-12-13 Version 6.0.11
- Reduced memory overhead for automatic statistics on columnstore tables
- MemSQL can now handle Lz4 compressed columnstore blob files larger than 4 GB
- Improved out of memory handling during binding and parsing
- The ROWS column in
Information_schema.table_statisticsnow includes rows in the in-memory segment of a columnstore table. It also correctly reports the ROWS for tables with only a hash index.
- Fixed a bug which caused an error when a LOAD DATA that used an ‘@’ symbol in the column list to skip loading a particular column was run on a temporary table
- Added support for the
group_concat_max_lensession variable that controls the maximum result length of
- Reduced occurrences of the error “Partition’s table metadata are out of sync for table”
- Fixed an issue where queries could incorrectly fail due to the max table memory limit being reached
- Fixed an inconsistency issue with backups taken when a multi-statement transaction was committing. The backup might have been able to contain a partial transaction in this case.
INSERT … SELECTqueries against columnstore tables that are pushed down into the leaves now sort the newly inserted rows as specified by the CLUSTERED COLUMNSTORE INDEX before committing. This matches the behaviour of LOAD DATA into columnstore tables.
- Added performance enhancements for how quickly leaf nodes start and stop replication during any clustering operation (REBALANCE PARTITIONS, ATTACH LEAF, failover, etc.)
- Fixed an issue which would cause a LOAD DATA statement to hang if a connection to a leaf used by the LOAD timed out
- Upgraded Xerces library to version 3.2.0
- Upgraded Curl library to version 7.57.0
- Added functionality and stability improvements for query type variables
2017-11-28 Version 6.0.10
- Fixed a problem that caused invalid metadata to be generated by MemSQL 5.X that breaks MemSQL 6.0 on upgrade. (Similar to a fix made in 6.0.8).
- Set the default value for
max_connection_threadsto 192 on aggregators instead of 8192. This is the same default as earlier versions of MemSQL.
2017-11-15 Version 6.0.9
- Fixed bug that caused errors to occur when replicating the cluster internal database on clusters with more than 26 leaf nodes
2017-11-07 Version 6.0.8
- Fixed an issue with synchronously replicated databases where snapshots could sometimes cause a substantial replication slowdown. Asynchronous databases were not affected.
- Fixed an issue where MemSQL Pipelines crashed if an error message contained certain special characters
- Added cross-cluster replication between clusters with more than 63 nodes
- Added an additional check to remove invalid partition metadata upon upgrade to MemSQL 6
- Added an update to disallow using the CALL syntax on built-in functions
- Fixed a bug determining the definer of a UDF or stored procedure in cases where the user who created the UDF or stored procedure was logged in using a grant with a wildcard in it
2017-10-25 Version 6.0.7
- Improved MemSQL Pipelines to support using keywords as column names
- Fixed an issue with MemSQL Pipelines where pipeline metadata was not being copied properly during upgrade