6.0 Release Notes

MemSQL 6.0 includes enhancements to query processing with up to 80X performance improvement for group-by/aggregate queries, and broader SQL support. It also introduces extensibility features, and enhanced manageability and resiliency.

MemSQL 6.0 contains the following new capabilities:

Extensibility

MemSQL 6 introduces extensibility features as part of the new MemSQL Procedural SQL (MPSQL) language. MemSQL developers can create user-defined:

  • Stored procedures (SPs)
  • Scalar-valued functions (UDFs)
  • Table-valued functions (TVFs)
  • Aggregate functions (UDAFs)

SPs, UDFs, and UDAFs are compiled to machine code for high performance. Array and record types are supported in SPs and UDFs. SPs and UDFs also support exception handling. The MPSQL language will be familiar and straightforward to learn for database developers who have written functions and stored procedures in other database languages.

Query Processing

Query Language Features

The following new query language features are included in the release:

  • Window functions with complex frames like “between 5 preceding and current row.” These are useful for moving averages needed for financial applications, electric utilities applications, and more.
  • New window functions, including first_value, last_value, nth_value, percentile_disc, percentile_cont
  • Cross-database joins and cross-database INSERT … SELECT
  • UPDATE and DELETE through join
  • UPDATE with subselect in SET clause
  • Unenforced unique constraints
  • INTERSECT, EXCEPT and MINUS

Query Optimization

  • Automatic cardinality statistics for columnstore
  • Improved cost-based join optimization
  • Improved hash join selection
  • Improved optimization of UPDATE and DELETE queries
  • Improved optimization of LEFT JOIN
  • Enhanced join cardinality estimation

Query Execution

Query execution improvements include:

  • Enhanced operations on encoded (compressed) data
  • Single-Instruction Multiple Data (SIMD) support using AVX-2
  • Queries like “select a, count(*) from t group by a” on columnstore tables speed up by as much as 80X
  • Best-case performance for group-by/count(*) queries can exceed one billion rows per second per core on the latest processors
  • Improvements of up to 30X on queries that:
    • Filter on run-length- and dictionary-encoded data
    • Group-by on encoded data, with one or more sum/avg/min/max or other aggregates
    • Perform hash joins of small dimension tables to fact tables on encoded columns, through improved Bloom filter join acceleration
  • More aggressive use of dictionary and run length encoding to benefit query execution
  • Improved columnstore update performance
  • Improved performance for access to JSON fields in columnstore tables
  • Improved columnstore segment elimination based on hash joins

Columnstore

Improvements to the columnstore include:

  • Computed columns can be added to columnstore tables via ALTER TABLE.
  • The default columnstore segment size has been increased from 102,400 to 1,024,000.

Data Loading

Data loading enhancements include:

  • Variable and expression support in LOAD DATA
  • Pipelines scheduling improvements

Manageability and Resiliency

Manageability and resiliency has been improved for the 6.0 release, including:

  • Enhanced fault tolerance for leaf failover, master aggregator failover, and ALTER TABLE.
  • Reduced blocking for backups, ALTER TABLE, leaf failover, and system management operations.

Benefits include fewer situations that require manual intervention, simplified application development, and improved availability.

Streamliner Deprecation

MemSQL Streamliner was previously deprecated and is removed in MemSQL 6.0. For current Streamliner users, we recommend migrating to MemSQL Pipelines instead. MemSQL Pipelines provides increased stability, improved ingest performance, and exactly-once semantics. For more information about Pipelines, see the MemSQL Pipelines documentation.

What Do I Do With My Old Streamliner Spark Cluster?

To upgrade to MemSQL 6.0, you must first remove each of your Streamliner pipelines, and then uninstall the Spark Cluster co-located with your MemSQL cluster. Once each of the pipelines has been removed, simply run memsql-ops spark-uninstall to uninstall the Spark Cluster.

Can I Still Use Spark With MemSQL?

Yes, MemSQL supports Spark integration via the MemSQL Spark Connector. The connector allows you to leverage your existing Spark clusters to write data directly to MemSQL via a performant and easy-to-use API. For more information on the MemSQL Spark Connector, please see the MemSQL Spark 2.0 Connector GitHub page.

How Do I Get Started With Pipelines?

For more information on Pipelines, please see:

Maintenance Release Changelog

2018-07-16 Version 6.0.26

  • If the query optimizer needs to run a sampling query with an IN list, it will now run that query in llvm code-generation mode instead of interpreter mode to reduce query optimization times.
  • Fixed a race condition during a REBALANCE operation which could cause the columnstore flusher thread to exit. Without a flusher thread, columnstore data will accumulate in the the in-memory segment causing memory pressure.
  • No longer holds a snapshot transaction open when copying the columnstore blob files during a BACKUP. Keeping a snapshot transaction open prevented the rowstore garbage collector from cleaning up deleted rows visible to the snapshot which caused memory use to increase during a backup.
  • Fixed an issue with accuracy of statistics collected on MemSQL 5.5-5.8 after upgrade to 6.0. (In earlier versions of 6.0, running ANALYZE TABLE after upgrade also fixes the issue.)

2018-07-02 Version 6.0.25

  • MemSQL can now repair two torn logs as long as the 2nd log is less than 25 MB in size.
  • Fixed an issue where a potential crash could occur when running cross-database joins during a failover.
  • Fixed usage of IN list filters with parameters from stored procedures in query-type variables.
  • Fixed an issue where MemSQL would become unresponsive if a cross data center replication operation was stopped while the replication process was still being initialized.

2018-06-18 Version 6.0.24

  • No longer stores duplicate plans in-memory inside the plan cache on leaves when the same query is executed on different aggregators. After this fix, a single entry in the leaf plan cache can be used by any aggregator that runs a given query.

2018-06-04 Version 6.0.23

  • Fixed a log corruption issue that occurred when doing multi-inserts into temporary tables in the same transaction as writes into non-temporary tables.
  • Now throws an error when a subquery is used inside of an expression in the insert VALUES(..) clause. Subqueries outside of expressions were already blocked in the VALUES clause.

2018-05-21 Version 6.0.22

  • Fixed a performance issue impacting update and delete queries with joins on columnstore tables.
  • Ensure nodes stop responding to heartbeats early on during shutdown so that failover happens as quickly as possible.
  • Fixed the group_concat_max_len system variable for more complex query shapes.
  • Now throws an error if a user-defined function (UDF) or stored procedure has more than 100 parameters.

2018-05-01 Version 6.0.21

  • Now recover reference databases first when a leaf node restarts. References databases need to be online in some edge cases in order to get partition databases attached back into the cluster.
  • Handled a failure case that occurs when replicating columnstore blob files as part of a replication process and the connection is lost at particular points in time when download the blob file. It was possible for blob files to be missed.
  • Repaired torn log files in more cases during recovery. MemSQL can now recover when two log files have torn tails. This change automatically repairs the log when the server previously used to generate “a future log file exists. Unable to recover” error messages.
  • Two new variables called enable_disk_plan_expiration and disk_plan_expiration_minutes enable cleanup of rarely used compiled query plans from the plancache directory on disk. When turned on, enable_disk_plan_expiration causes any plan that hasn’t been read from disk in the last disk_plan_expiration_minutes minutes to be deleted from disk. By default, enable_disk_plan_expiration is set to false (which means the feature is off by default), and disk_plan_expiration_minutes is set to 20160 (14 days).
  • Fixed issues upgrading from MemSQL 5.0 or 5.1 to MemSQL 6 if ANALYZE TABLE had been run before upgrade.
  • Fixed a memory leak in LOAD DATA statements that use a WHERE clause.

2018-04-02 Version 6.0.20

  • Added fix to avoid creating columnstore JSON segments which are too large for gzip compressions to handle (larger than 2 GB).
  • Fixed a rare crash when decoding a columnstore JSON column.
  • Fixed an issue where a multi-statement write transaction committing during a BACKUP had a small chance of creating an inconsistent backup. The backup might have been able to see only part of the transaction in rare cases.
  • Fixed a performance issue where queries ran inside of stored procedures would do extra unnecessary type conversions causing performance regressions compared to running the same query outside of a stored procedure.
  • Fixed a deadlock when running an ALTER TABLE MODIFY COLUMN against a table with computed columns.
  • Fixed an issue with upgrading to MemSQL 6 for tables that have a column named “flags”. In rare cases, there can be a collision with a column named flags in an internal metadata table, which causes recovery to fail after upgrading to MemSQL 6.

2018-03-12 Version 6.0.18

  • Now allow UPDATE queries with SET statements that use columns from the shard key if the SET statement doesn’t change the value of the shard key columns (i.e. it sets the shard key columns to their existing values). Previously, SET statements that included the shard key columns were disallowed regardless of the value the columns were set to.
  • The memory used by distributed joins for temporary processing is no longer limited by the setting of maximum_table_memory. It is now limited by maximum_memory like other query processing memory usage.
  • The FORCE_INTERPRETER_MODE query hint is now forwarded to the leaves so that queries run using the interpreter across the entire cluster instead of using the interpreter only on the aggregator.
  • Distributed joins with correlated sub-selects would cause the server to crash during code-generations in some edge cases.
  • Fix extraneous reprovisions of the original master partition during REBALANCE partition that moves the master to a different node in the cluster. The issue may occur if the columnstore merger or flusher is very active during the rebalance operation.

2018-03-02 Version 6.0.17

  • Memory is now cached more aggressively in the buffer manager. MemSQL 6.0.15 introduced a change which caused MemSQL to stop caching memory if Total_server_memory grew to more than 90% of maximum_memory. Due to performance issues allocating memory on older linux kernels (2.6), we now always allow 1% of maximum_memory to be cached even if memory use is high. We also allow caching until Total_server_memory is within 90% of maximum_memory or until Total_server_memory is greater than maximum_memory minus 10 GB (to maintain a ceiling of 10 GB of remaining RAM). The latter behavior is for machines with more physical memory where stopping caching at 10% of remaining maximum_memory is too conservative.
  • Allow MPSQL variables to be used as arguments to Table Valued Functions that are called within Stored Procedures.
  • Fix crash that can occur when CALLing or ECHOing a Stored Procedure with arguments that are a different type than the type of the arguments used when the statement was compiled.
  • Allow the return type of VOID to be explicitly specified when creating a Stored Procedure.

2018-02-16 Version 6.0.16

  • Fixed an issue which causes the columnstore background merger to block a DROP TABLE operation for an extended period of time.
  • Fixed a crash when restarting an aggregator with more than 256 users created.

2018-02-12 Version 6.0.15

  • Improved performance of queries which project duplicate columns and require no processing on the aggregator.
  • Fixed a crash when OPTIMIZER_STATISTICS information_schema table is queried concurrently with DROP DATABASE.
  • Improved performance of SHOW TABLE STATUS when there are many columnstore tables.
  • Fixed an issue with the connection scheduler causing unnecessary context switch misses on rare occasions.
  • Cache memory in the buffer manager less aggressively. The new behavior is to not cache if Total_server_memory use is within 90% of maximum_memory.
  • Fixed a crash caused by SHOW commands with long LIKE strings
  • Allow DDL commands to run concurrently with BACKUP and RESTORE and allow concurrent RESTORES of different databases.
  • Fix ‘table does not exist’ error when attempting to grant extensibility permissions to roles or groups.
  • Fixed issue where table-valued functions (TVFs) with the same body but different parameter specifications would incorrectly use the same plan cache entry.
  • Fix issue with cardinality estimation for comparisons of some binary columns to string constants, and some string columns to binary constants.

2018-01-29 Version 6.0.14

  • Fix a crash during RESTORE database if an ALTER TABLE query had been run on a reference table prior to creating the backup.
  • Reduce memory fragmentation in the variable allocator caused by the garbage collector looking for completely empty pages of memory.
  • Fix a performance regression in singleton insert workloads introduced in MemSQL 6.
  • Improves columnstore scan performance on many-core machines when using the GROUP BY clause in single-table aggregate queries.
  • Fix a crash that could occur when optimizing queries with multiple window functions.
  • Improved join optimization for queries with constant filters on hash indexes.

2018-01-11 Version 6.0.13

  • Fix a compatibility issue which prevented the MariaDB 2.2.0 JDBC driver from connecting to MemSQL
  • Fix an issue with auto statistics getting incorrectly refreshed after a node restart when auto stats were enabled with ALTER TABLE
  • Reduce the memory used by the merger threads to refresh stale auto statistics
  • Keep fewer deleted blob files that are less than 16KB in size in the columnstore replication window to reduce inode usage. The number of files kept is bound by disk usage (2 GB), but for small blob files 2 GB of disk usage could exhaust the inodes on the file system.
  • Out of memory handling hardening for cluster database replication
  • Enhancements to the columnstore flusher to reduce disk IO
  • Include CPU feature flags in the persisted plan cache lookup for plans stored on disk to prevent the plan from being used on processors without the proper features (avx2, etc.). This protects against the underlying hardware MemSQL runs on changing after a node restart which can happen in some cloud environments. If this happens query plans will now recompile.

2017-12-13 Version 6.0.11

  • Reduced memory overhead for automatic statistics on columnstore tables
  • MemSQL can now handle Lz4 compressed columnstore blob files larger than 4 GB
  • Improved out of memory handling during binding and parsing
  • The ROWS column in Information_schema.table_statistics now includes rows in the in-memory segment of a columnstore table. It also correctly reports the ROWS for tables with only a hash index.
  • Fixed a bug which caused an error when a LOAD DATA that used an ‘@’ symbol in the column list to skip loading a particular column was run on a temporary table
  • Added support for the group_concat_max_len session variable that controls the maximum result length of GROUP_CONCAT in bytes
  • Reduced occurrences of the error “Partition’s table metadata are out of sync for table”
  • Fixed an issue where queries could incorrectly fail due to the max table memory limit being reached
  • Fixed an inconsistency issue with backups taken when a multi-statement transaction was committing. The backup might have been able to contain a partial transaction in this case.
  • INSERT … SELECT queries against columnstore tables that are pushed down into the leaves now sort the newly inserted rows as specified by the CLUSTERED COLUMNSTORE INDEX before committing. This matches the behaviour of LOAD DATA into columnstore tables.
  • Added performance enhancements for how quickly leaf nodes start and stop replication during any clustering operation (REBALANCE PARTITIONS, ATTACH LEAF, failover, etc.)
  • Fixed an issue which would cause a LOAD DATA statement to hang if a connection to a leaf used by the LOAD timed out
  • Upgraded Xerces library to version 3.2.0
  • Upgraded Curl library to version 7.57.0
  • Added functionality and stability improvements for query type variables

2017-11-28 Version 6.0.10

  • Fixed a problem that caused invalid metadata to be generated by MemSQL 5.X that breaks MemSQL 6.0 on upgrade. (Similar to a fix made in 6.0.8).
  • Set the default value for max_connection_threads to 192 on aggregators instead of 8192. This is the same default as earlier versions of MemSQL.

2017-11-15 Version 6.0.9

  • Fixed bug that caused errors to occur when replicating the cluster internal database on clusters with more than 26 leaf nodes

2017-11-07 Version 6.0.8

  • Fixed an issue with synchronously replicated databases where snapshots could sometimes cause a substantial replication slowdown. Asynchronous databases were not affected.
  • Fixed an issue where MemSQL Pipelines crashed if an error message contained certain special characters
  • Added cross-cluster replication between clusters with more than 63 nodes
  • Added an additional check to remove invalid partition metadata upon upgrade to MemSQL 6
  • Added an update to disallow using the CALL syntax on built-in functions
  • Fixed a bug determining the definer of a UDF or stored procedure in cases where the user who created the UDF or stored procedure was logged in using a grant with a wildcard in it

2017-10-25 Version 6.0.7

  • Improved MemSQL Pipelines to support using keywords as column names
  • Fixed an issue with MemSQL Pipelines where pipeline metadata was not being copied properly during upgrade

2017-10-18 Version 6.0.6

  • Initial GA release of MemSQL 6.0.

Related Topics

Was this article useful?