MemSQL 5.5 introduces MemSQL Pipelines, a native mechanism for scalable real-time data ingestion from a wide variety of streaming and static data sources. In addition, this release delivers 3x-5x faster query performance driven by a novel hash-table design, use of Bloom filters, native support for semi-joins and anti-joins, and improved concurrency management for distributed joins. MemSQL 5.5 also enhances ease of use with new functionality like Query Profiling, Workload Management, a more scalable new MemSQL Ops version, and faster recovery for row stores.
MemSQL Pipelines is a native database feature that supports real-time data ingestion from Kafka streams with exactly-once semantics. Pipelines provides a robust, scalable, and highly performant way of extracting, transforming, and loading external data for distributed workloads. Learn more about Pipelines here.
MemSQL 5.5 delivers a new hash table design which combined with Bloom filters delivers up to 3x-5x performance improvements for hash joins.
More efficient utilization of threads and connections delivers increased performance for distributed joins and enables higher levels of concurrent queries.
MemSQL now delivers native support for executing certain correlated subqueries as semi-joins and anti-joins.
The query profiling feature enables customers to run a query with the
PROFILE option in order to examine the execution details for each step of the query in terms of number of rows processed and the actual processing time. The query profiler can be helpful when identifying performance issues down to particular steps in query execution. Learn more about PROFILE here.
System administrators have the challenging task of coordinating query workloads from their large user base. The workload manager prevents system overload by limiting the resource usage of queries executing at any given time. The workload manager queues up queries that can not be run immediately and runs them later when capacity becomes available. Learn more about Workload Management here.
MemSQL Ops provides administration of MemSQL clusters. MemSQL 5.5 Ops now offers support for clusters with more than 100 nodes and beyond. Additionally MemSQL Ops provides UX for the newly introduced MemSQL Pipelines. For more information, see MemSQL Ops Releases.
In the event of a node failure, MemSQL now offers faster recovery for row stores with secondary indexes.
Creating reference tables obviates the need to reshuffle data and creates co-located joins. MemSQL 5.5 adds support for reference tables in the column store.
It is now possible to debug
LOAD DATA processes in MemSQL, leading to increased developer productivity. The
LOAD DATA... IGNORE command stores errors it sees during execution. Errors can be displayed using a new
SHOW LOAD ERRORS command, and can be used subsequently for corrective actions. Errors are stored in an in-memory buffer that is deleted when a subsequent query is run.
The changelog below contains MemSQL Database improvements and bug fixes introduced in maintenance or revision releases. For a similar list for MemSQL Ops, see MemSQL Ops Releases.
- Fixed a deadlock issue that could occur in rare situations for distributed join queries with both wide rows and skew in the data distribution.
- Improved columnstore table replication performance.
- Fixed an instability issue that could occur when adding JSON columns to a columnstore table.
- Fixed a MemSQL startup issue that affects older hardware without SSE4 support.
- Fixed a backup failure issue for columnstore tables that occurs if an
ALTER ADD COLUMNstatement is executed just prior to backup.
- Fixed a performance issue with
- Fixed an instability issue when executing distributed join queries that use certain table aliases.
- The plan cache now records a query's CPU execution time in milliseconds. For example, the
SHOW PLANCACHE;command's results include a
CpuTimecolumn for each query.
- Fixed a performance issue when executing a
- Fixed a performance issue when executing an
- Fixed a metadata corruption issue that could occur after executing an
- Fixed an issue where distributed join (broadcast) operations could hang after cluster failover.
- Fixed an issue where users with correct permissions could encounter "access denied" error messages during pipeline creation or pipeline updates.
- Enable multi-partition queries for first statement of a multi-statement transaction to reduce network traffic overheads in some cases.
- Fixed an error message for
CREATE TABLE AS ... SELECT NULLstatements, which could cause cluster instability.
- Fixed an issue when recovering a cluster with a
redundancy_levelset to 2. Now when auto-attaching a leaf, the cluster will not rebalance until after the leaf has recovered.
- Columnstore tables now replicate much faster.
- Pipelines transforms now execute much faster.
- Performance has been improved during cluster failover by reducing the duration of partition locks.
- Fixed an issue where broadcast operations on temporary tables could fail with an error in some cases.
- Fixed an issue with queries that require many seek operations.
- Fixed an issue with
ALTER TABLE CHANGEwhen renaming a column to address possible table corruption on recovery.
PROFILEis now more accurate for column store scans.
- Added a global variable that disables parallel query execution for leaf nodes:
enable_multipartition_queries = false.
- Fixed an instability and accuracy issue that could occur for users upgrading query optimizer statistics from 5.1 to 5.5.