6.8 Release Notes

The primary improvements in the MemSQL 6.8 release are enhancements to HDFS pipelines security and improved ad hoc (first run) query performance.

See the descriptions and changelog below for more information on these new features, as well as the other new features in the 6.8 release.

Data Loading

  • Advanced HDFS Pipelines that import data from HDFS using Kerberos and wire encryption.
  • Performance improvements to Pipelines extractors
  • Added TRAILING NULLCOLS and NULLS DEFINED BY clauses to LOAD DATA and CREATE PIPELINE

Query Optimization, Compilation, and Statistics

  • Can speed up ad-hoc (first query) run times by having queries interpreted and then dynamically compiled during first query execution. This behavior was available as experimental setting in MemSQL 6.7, but is production-ready and enabled by default in 6.8. See the interpret_first documentation for more details.
  • Performance improvements for some outer, anti-, and semi-joins
  • Performance improvements to ROLLUP and CUBE

Resource Governor Improvements

  • LOAD DATA operations run in the resource pool for the current connection where the load operation is running
  • Stored procedures run in the resource pool of the current connection from where they are called
  • Query optimization always runs in the resource pool of the current connection

Functional Extensions

  • A subquery does not require an alias, assuming that removing the alias doesn’t create ambiguity
  • Increased the precedence of || as concat (under sql_mode = PIPES_AS_CONCAT) to be compatible with MySQL and Postgres
  • Added SPLIT() and JSON_TO_ARRAY()
  • Enabled TO_DATE() to support format strings with time-related format options (HH, SS, etc.)
  • Enabled TO_DATE() to support the DD format option
  • Enabled TO_TIMESTAMP() to support YY and FF format options
  • Enabled TO_TIMESTAMP(), TO_DATE(), and TO_CHAR() to support the D format option
  • Enabled TO_TIMESTAMP() and TO_DATE() to support using different punctuation as separator
  • Enabled TO_TIMESTAMP() and TO_DATE() to raise an error instead of returning NULL for certain error cases
  • Enabled TO_CHAR() to support AM and PM
  • Modified how TO_TIMESTAMP() parses 12 when using the HH format option

Maintenance Release Changelog

2019-11-11 Version 6.8.13

  • Fixed issues with the information_schema.MV_SYSINFO tables used by MemSQL Studio to manage CPU usage of running processes.
  • Added the performance_schema read-only engine variable. This allows newer MySQL Connector/J and JDBC drivers to connect to MemSQL.
  • Now, ALTER TABLE and TRUNCATE TABLE only wait for garbage collection to complete if a previous operation triggered garbage collection on the same table. Previously, ALTER TABLE and TRUNCATE TABLE waited for garbage collection to complete on any table.
  • Now, improve performance of garbage collection on the plancache, when the plancache is large. Garbage collection on the plancache is triggered when the plan_expiration_minutes engine variable expires.

2019-11-04 Version 6.8.12

  • Fixed an issue causing the workload manager to share the same queue for memory intensive, connection intensive, and thread intensive queries. Now, the workload manager uses a queue for memory intensive queries and another queue for connection and thread intensive queries.
  • Fixed a crash that occurred when querying the information_schema.PIPELINES_ERRORS table.
  • Fixed an issue that caused resource pools to gather incorrect memory use statistics for some queries.
  • Fixed a crash that occurred when a stored procedure used by a pipeline writes to a different database than where the pipeline is located.

2019-10-21 Version 6.8.11

  • Fixed an issue where running SHOW PLANCACHE or querying information_schema.PLANCACHE could cause a server crash if the engine runs out of memory while performing these operations.
  • Fixed an issue where running KILL QUERY on an ALTER VIEW query could cause new queries accessing the view to go into an infinite loop.
  • Reduced the query compilation time for some complex queries.
  • Now, SET PASSWORD can be used to set the password for the root user. This applies when the sync_permissions engine variable is set to ON.
  • Fixed an issue where no error was returned when large negative row-ids were present in an internal table. These row-ids are used when loading data using pipelines.
  • Fixed an out-of-memory issue that could occur in queries with multiple subselects and a WHERE clause containing a constant predicate.
  • Fixed a crash that occurred when running a REVOKE statement containing an ON clause without a database name.
  • Fixed an out-of-memory issue that occurred during the compilation of query shapes having many unions. These query shapes use many parameters.

2019-09-16 Version 6.8.10

  • Now, report an error when running out of memory while allocating stack space. This applies when interpreter_mode is set to interpret_first.
  • Added support for server-side encryption when backing up to S3.
  • Added the max_async_compilation_concurrency engine variable to limit the number of compiles that can run at the same time when interpreter_mode is set to interpret_first. Also, changed the default number of concurrent async compiles to half the number of cores on the host machine. It was previously set to the number of cores on the host machine.
  • Added the information_schema.MV_WORKLOAD_MANAGEMENT_STATUS table, which shows the status of workload management.
  • Added an optimization for SELECT queries that have specific shapes. The optimization allows these queries to compile faster.
  • Added the max_compilation_time_s and max_compilation_memory_mb engine variables. The former is the timeout for compiling a query. If the timeout is reached, an error is returned. The latter is the maximum amount of memory used to compile a query. If this maximum is reached, an error is returned.
  • Added the information_schema.LMV_ASYNC_COMPILES table, which shows the async compiles that are running or queued. Also, added the KILL QUERY COMPILE command that allows you to kill compiles in the LMV_ASYNC_COMPILES table.

2019-09-03 Version 6.8.9

  • Fixed an issue where TRUNCATE TABLE caused unrecoverable slave partitions.
  • Now, allow a Kafka pipeline to continue running if the latest batch partition offset is lower than the earliest offset.
  • Fixed the permission check on CREATE TABLE ... SELECT queries to correctly check for the SELECT permission type on the tables being selected. Prior to the fix, the tables being selected were checked for the permission type required by the previous compiled query.
  • Fixed an issue where the TOTAL_QUEUE_TIME field in information_schema.MV_RESOURCE_POOL_STATUS would become negative.
  • Now, do not allow the setting MAX_QUEUE_DEPTH to be specified in a resource pool when the MAX_CONCURRENCY setting is not specified in the resource pool.
  • Now, reduce the amount of tracing done by EXPLAIN CLEAR ORPHAN DATABASES.
  • Fixed an issue where a user could run GRANT <privileges> ON <resources> TO <role> if the user didn’t have the privileges being granted. Now, this GRANT query fails if the user doesn’t have these privileges.
  • Fixed an issue where cluster-wide permissions could be granted, at the database level, to roles. These permissions can no longer be granted at the database level.

2019-08-19 Version 6.8.8

  • Now, improve the performance of restoring a database that contains many small columnstore BLOB files.
  • Fixed a crash that occurred when LOAD DATA encountered an error on a line of input whose text is larger than 100 MB.
  • Fixed an issue where file descriptors were sometimes left open when processes external to the engine (such as pipeline processes) encounter errors.

2019-08-12 Version 6.8.7

  • Added information_schema tables containing the MV_GLOBAL prefix. These tables can be queried to view global variable settings and server status information.
  • Fixed an issue where a global sync variable that was set on the master aggregator would not sync to other nodes in the cluster.
  • Changed how successive row-id values are generated in an internal system table. Now, these values are more likely to be the same. These values are used when loading data using pipelines.

2019-08-05 Version 6.8.6

  • Added information_schema tables containing the MV_SYSINFO prefix. These tables can be queried to monitor the CPU, memory, disk, and network usage of the cluster.
  • Added the information_schema.MV_PROCESSLIST table. This table can be queried to view the queries that are currently running in the cluster.
  • Now, improve the performance of LOAD DATA when it loads a file from a directory that contains many files.
  • Fixed a bug where memory usage being tracked by resource pools is negative. Also, corrected some tracelog messages regarding resource pool memory tracking.

2019-07-29 Version 6.8.5

  • Fixed an issue that caused user defined function (UDF) security checks to be done on leaf nodes instead of on the aggregator. This would cause a permission denied error because users created by a client application don’t exist on leaf nodes.

2019-07-22 Version 6.8.4

  • Removed some unneeded memory allocations that were being done before running parallel queries. This addresses the high CPU usage that sometimes occurred when running these queries.
  • Now, clear the DNS cache on secondary cluster nodes when nodes are modified on the primary cluster.
  • Fixed an issue where ALTER PIPELINE would go into an infinite loop.
  • Fixed a rarely occurring issue where resource pools were incorrectly tracking memory usage.
  • Fixed an issue where the use of AVX2 and SSE2 by LOAD DATA was disabled.
  • Now, if there are too many threads waiting for child threads to be scheduled, switch to non-parallel execution if possible. Fail the query otherwise. Previously, there was a thirty second timeout for launching child threads. This is too long of a wait for some workloads.
  • Now, remove MemSQL’s dependency on the libnsl.so.1 library to make it easier to deploy MemSQL on flavors/versions of Linux such as RHEL 8 that don’t have libnsl.so.1 installed by default.
  • Fixed an issue where the constant folding optimization was disabled in edge cases. The fix speeds up queries that were affected by the issue.
  • Now, address a deadlock that is possible during REBALANCE PARTITIONS if the child aggregator runs out of connection threads.
  • Fixed a syncing issue seen when files were written to the plancache during code generation. This issue caused code generation failures and unrecoverable databases.
  • Now, disable correlated subselects when using ROLLUP() and CUBE(), as it is currently unsupported. Also, fixed an issue where an optimization incorrectly reordered a GROUP BY and join when ROLLUP() or CUBE() was being used.
  • Added new configuration settings to Kafka pipelines to support explicit Kerberos credential caches (sasl.kerberos.ccache) and disabling of kinit (sasl.kerberos.disable.kinit).
  • Now, a pipeline waits for the reference database to be in sync before the pipeline compiles.

2019-06-24 Version 6.8.3

  • Fixed an issue when interpreter_mode is set to interpret_first, where incorrect code was generated. This issue could cause a crash or an infinite loop.
  • Fixed an issue where memsqlctl wouldn’t show license details.

2019-06-10 Version 6.8.2

  • Fixed an issue that was causing high system CPU usage. This issue occurred when running context switch heavy workloads. These workloads contain many fast executing queries that run against multiple connections.
  • Now, a leaf node fails over to its pair if, for sixty seconds, the leaf has all of its threads running queries (as specified by max_connection_threads) and is not maintaining a minimum throughput of five queries per second.
  • Now, a leaf node fails over to its pair if the leaf’s disk usage falls below the value of minimal_disk_space.
  • Fixed an issue where LAST_INSERT_ID(<expression>) was returning an incorrect value.
  • Now, security management commands such as CREATE GROUP, CREATE ROLE and CREATE USER can be run inside of stored procedures.
  • Now, improve the performance of SELECT ... INTO S3.
  • Now, the value of the SIZE field in information_schema.MV_BACKUP_HISTORY is stored in megabytes instead of bytes.
  • Fixed an issue where a user authenticated with Pluggable Authentication Module (PAM) could not run ANALYZE TABLE.
  • Fixed an issue where the bootstrap aggregator would not work on machines where the only configured address is the loopback address.

2019-05-14 Version 6.8.1

  • Initial GA release of MemSQL 6.8
Was this article useful?