MemSQL 5.7 brings a number of improvements to query processing and performance, new query processing extensions, management enhancements, a new
DATE_TRUNC function, and support for Amazon S3 as a data source for MemSQL Pipelines.
The MemSQL Terms of Service for this software have changed. By downloading and/or using the software, you acknowledge you have read and agree to the MemSQL terms of service.
Amazon S3 support has been added for MemSQL Pipelines. MemSQL can now ingest objects from S3 buckets in a massively parallel way with exactly-once semantics. See the S3 Pipelines docs for more information, including a quickstart for how to start using an S3 pipeline immediately.
Query performance has been improved through enhancements to the query optimizer:
- Scans with impossible predicates are optimized out in more situations. For example, a query like
SELECT * from T where 0=1does not scan T; it returns immediately.
INSERT INTO…SELECT…queries with a computed shard key column on the target table can now be executed with a reshuffle, which improves performance when inserting large amounts of data.
- Performance has been improved for joins using the null-safe equals operator
- Performance has been improved for queries like
SELECT COUNT(DISTINCT c1) FROM t GROUP BY c2via an internal optimizer query rewrite.
- Memory usage and performance has been improved for queries with multiple count distinct expressions, via an internal optimizer query rewrite.
- Short circuit evaluation for
LIMITqueries has been improved. The aggregator node now terminates computations on leaf nodes as soon as the limit is reached.
- Reshuffles are reduced when inserting rows into a table that are selected from same table.
DATE_FORMATis automatically rewritten to
DATE_TRUNCwhen applicable, to improve query performance, since string manipulation done by
DATE_FORMATis more expensive than truncating the date. For example,
DATE_FORMAT(date_col,"%Y-%m-01 00:00:00")can be automatically rewritten to
DATE_TRUNC(date_col,"year")in situations where its result is immediately converted back to a date.
A LOAD DATA … SKIP … ERRORSoption has been added, to skip loading of rows with errors. See the updated LOAD DATA docs for more information and a comparison between this feature and
- Window functions with
PARTITION BYin distributed joins are now supported.
- Support for correlated subqueries in distributed plans has been improved.
DATE_TRUNC(period, timestamp) built-in function is now supported.
DATE_TRUNC returns the beginning of the specified period (e.g. the start of Monday for the period of ”week”). It supports grouping on rounded time buckets for convenient summary reports by day, week, etc. See the DATE_TRUNC docs for more information.
A distributed plan cache has been made available in
information_schema. Two new system views are available:
distributed_plancachedisplays plan caches for all nodes in the cluster. It contains rows for both leaf and aggregator queries.
distributed_plancache_summaryhas one record per query for the whole cluster. It sums up activity over all nodes.
- Full query text including parameters is now displayed in results for
SHOW PROCESSLISTand queries on
information_schema.processlist. In MemSQL 5.5 and older, only the parameterized query text is displayed. To control the visibility of query parameters, a new global variable named
show_query_parametershas been added. For more information, see SHOW PROCESSLIST.
- A light-weight process ID (thread ID) column
LWPIDhas been added to the
The changelog below contains MemSQL Database improvements and bug fixes introduced in maintenance or revision releases. For a similar list for MemSQL Ops, see MemSQL Ops Releases.
- Fixed an issue where S3 pipelines would indefinitely suspend data extraction if connection issues or throttling occurred between Amazon S3 and MemSQL.
- Fixed an issue where users with
REQUIRE SSLpermissions could not execute
- Fixed an issue where
SHOW STATUS EXTENDEDreported inaccurate values.