7.0 Release Notes
The main features of the MemSQL 7.0 release are highlighted below.
Much improved performance of synchronous replication and durability. Our synchronous replication now has only a minor performance penalty compared to our already fast asynchronous replication – giving you better data consistency guarantees without incurring large performance penalties. You can read more about our improved synchronous replication here.
Selective queries on columnstore tables now run faster, via sub-segment access.
Hash indexes on columnstore tables are now supported.
ALTER TABLEnow support adding and dropping hash indexes on columnstore tables.
DELETEqueries now lock columnstore tables at the row level, allowing improved concurrency. Previously, these queries locked columnstore tables on the segment level.
Sparse rowstore compression is now supported.
Queries on columnstore tables can now reorder filters to decrease execution time. Reordering occurs automatically and allows filters that are more selective to be evaluated first.
To deploy this release, follow the appropriate guide.
To upgrade a self-managed install to this release, follow this guide.
To make a backup of a database in this release or to restore a database backup to this release, follow this guide.
In addition to the release highlights noted above, the following new features and improvements are available in this release:
- Improved Workload Management implementation that no longer needs to be tuned via the
workload_management_expected_aggregatorsengine variable; this variable is now deprecated.
- Support for multi-statement transactions that use distributed joins.
- Support for multi-statement transactions that write to a database and then read from a different database.
- Added a session variable
node_degree_of_parallelismthat specifies the number of threads per node to use for columnstore table scans.
- Queries that use the
INfilter on shard keys now perform better.
- Hash joins that use reference tables now perform better.
data_conversion_compatibility_levelengine variable can now be set to
7.0. This allows errors to be thrown for more invalid type conversions, instead of doing implicit type conversions, where the user could see unexpected results.
- Now, a maximum of 1024 tables can be written to in a transaction.
- Column statistics are now automatically gathered on rowstore tables. This is in addition to column statistics being automatically gathered on columnstore tables, which was done previously.
- Range statistics (histograms) are now automatically gathered on rowstore and columnstore tables.
- Rowstore table sampling has been improved.
- Now, a true row-level random sample of rows from columnstore tables is maintained automatically and used to estimate the selectivity of complex predicates.
- All shapes of bushy joins are now supported.
- Common Table Expressions (CTEs) can now be materialized; redundant expressions can use cached data instead of being recomputed.
Data Storage, Loading, and Backup
- Performance improvements for bulk inserts, large insert/selects,
LOAD DATA, and pipelines to columnstore tables (particularly for wide columnstore tables).
- Added support for publishing data to Kafka via SELECT … INTO KAFKA ….
- Added support for Parquet in
- Improved error reporting for pipelines and
- Incremental backup for columnstore data.
- Added a column to
information_schema.TABLE_STATISTICSto indicate if a table is rowstore or a columnstore.
- Added columns to the
- Added the
- Cross-database views are now supported.
- Added built-in support for time series reporting through the new
GROUP_CONCAT()now supports the
- Added the
Maintenance Release Changelog
2020-08-24 Version 7.0.22
- Fixed an issue where
REPLICATE DATABASEwould fail in cases where no replicated database existed yet and the node containing the master aggregator was previously replaced with a new node.
SHOW PROCESSLISTreturns the IP address of the client (in the Host column) if the host name cannot be resolved.
- Fixed an issue that occurred when the engine variable
interpreter_modewas set to
interpret_first, where the first run of the query remained in interpreted mode even after compilation had finished.
- Added support for
ALTER PIPELINE ... SET RESOURCE POOL.
- Reduced the CPU usage required for many threads to allocate memory for variable length strings.
- Added a new engine variable
convert_nonunique_hash_to_skiplistwhich, when set to
ON, creates a skiplist index in place of any non-unique hash index. By default, the variable is set to
- Pipelines no longer stop executing if free disk space drops lower than
5 * <the value of the minimal_disk_space engine variable>. They now stop executing if free disk space drops lower than
400 mb + <the value of the minimal_disk_space engine variable>.
- Improved the error message returned when
REMOVE AGGREGATORis run on the master aggregator.
- The behavior of the
max_compilation_time_sengine variable no longer applies to DDL queries. Eliminating compilation timeouts for these queries, especially for
ALTER TABLE, will prevent the queries from potentially running forever.
- Fixed a compatibility issue that occurred when running newer versions of SAP Business Object Data Services (BODS) against MemSQL. The issue was that the command
SET sql_mode = <expression>was not supported by MemSQL.
- Fixed an issue where auto-attach would fail to attach a partition but succeed in bringing the leaf online, resulting in offline partitions.
- Ensured that any re-used existing replica partition is still forced in sync as part of the copy operation run as a step of a
REBALANCEoperation. Previously, this was a cause of failure in cases where
REBALANCEfailed once prior.
- Fixed a failure when running a
ROLLUPquery over a constant expression.
- Disallowed parameter names with
.in them when defining a function or stored procedure.
2020-07-20 Version 7.0.21
- Fixed an issue where joins with reference tables are sometimes executed using Cartesian joins, even when faster joins are possible.
- Improved the performance of rebalance partition operations that copy partitions with many small columnstore BLOB files.
- Fixed an issue where the engine variable
explicit_defaults_for_timestampcould be set to a different value on new nodes added to the cluster, as compared to the existing value already set on existing nodes. This would cause the error “partition’s table metadata are out of sync” when queries were run.
ANALYZE TABLEnow requires the
ALTER) permissions to execute.
- Improved the performance of batch deletes on columnstores. The increase in performance depends on the encoding of the columns.
preserve_original_colstore_jsonis set to
ON, null values and empty arrays are preserved in JSON columns in columnstores.
2020-06-29 Version 7.0.20
- Fixed an issue where user-defined tables having the same name as system tables, such as
USERS, were treated as case-sensitive when
table_name_case_sensitivitywas set to
- SSL ciphers that use elliptic curves are now supported.
- Added an optimizer setting that allows you to adjust the cost of performing a cross join.
STATEcolumn in the
information_schema.PIPELINESview now shows a pipeline in an error state when the pipeline stops due to insufficient disk space.
- Fixed an issue where running
STOP REPLICATINGwould sometimes cause replica partitions, after becoming primary partitions, to replicate asynchronously when the original primary database used synchronous replication.
- Improved predicate transitivity rewrites to avoid adding redundant predicates. This fixes compile timeouts for some queries with large numbers of predicates.
- Fixed an issue where filters were sometimes not being used as hash join conditions if they involved expressions. This was causing filters to be evaluated on more rows than needed.
2020-06-09 Version 7.0.19
- Correctly generate an error when running
ALTER VIEWon a schema-bound view v1, where a schema-bound view v2 refers to v1.
- Fixed an auto-attach failure in specific failure conditions where a node quickly got marked offline and then online by the aggregator, but the node itself did not restart.
- Improved the parallelism of heartbeat reconnects. This fixed erroneous node failures in cases of issues causing slow connections, when multiple nodes were unhealthy.
- Fixed a crash where
INSERT ... SELECT ... ON DUPLICATE KEY UPDATEreferred to tables not being written to.
2020-05-26 Version 7.0.18
- Added the engine variable
json_extract_string_collation. When the variable is set to
AUTO(the default setting) or
JSON, the extracted string has the same collation as the JSON object that is being extracted from. When the variable is set to
server, the extracted string has the same collation as the
- Reduced memory usage in certain outer join queries that use single table or constant filters.
- Fixed an issue where views using set operations (such as
MINUS) could be parsed incorrectly when being queried.
- Fixed an issue with
INSERT .. SELECTqueries that use generator functions (functions that are computed on the aggregator, with the results sent to the leaves).
- Fixed an intermittent incorrect permission denied error that occurred when views were used in queries on secondary clusters.
- Improved the performance of the query that is run on
information_schema.OPTIMIZER_STATISTICSby MemSQL Ops schema monitoring.
2020-04-27 Version 7.0.16
- Added the
innodb_strict_modeengine variable. The
django-mysqllibrary uses the value of this variable to connect to MemSQL.
- Improved the efficiency of locking done by the security manager to allow more concurrent login/logout operations.
- Added an error message to return when a command fails due to a bug: “Operation
<operation name>failed without specifying a detailed error message”.
- Fixed an issue where the security checks done for temporary tables could result in an incorrect access denied error.
- When external authentication to MemSQL (via PAM, Kerberos, etc.) is slow, track the time needed to authenticate and add logging to the tracelog. This is done for troubleshooting purposes.
- Now, the
RESTOREcommand can restore a 6.x backup to a later version, using a different database name than was specified when the database was backed up.
- Fixed an issue where views could not be queried after being created, due to a parsing error.
2020-04-06 Version 7.0.15
- Fixed an issue where synchronous replication incorrectly consumed all of the disk space on the host containing either the master or replica partitions. The issue occurred when the host containing the replica partitions had a specific type of network connectivity problem with the host containing the master partitions.
- Fixed an issue that could cause an online upgrade to fail if a reference database needs to reprovision in the middle of the upgrade.
- Fixed an issue where incorrect reuse of query plans stored on disk could cause the error “No function with id 23003 exists”.
- You can specify that the
SHOW VARIABLEScommand returns only cluster-wide variables.
- Fixed a bug where dropping a secondary index on a columnstore table and then adding another secondary index on the table (via
ALTER TABLE) causes the secondary indexes on the table to become corrupted after a node restart.
- Fixed a bug where using
ALTER TABLEto add a secondary index to a columnstore table, truncating the table, and then restarting the node would cause the added secondary index to be disabled and alters on the table to crash the node.
- Added Google Cloud Storage (GCS) pipelines to the
- Added the
histogramcolumn to the
- Added the
pathcolumn to the
information_schema.PLANCACHEview. This column contains the path to the file where a query plan is stored on disk.
2020-03-23 Version 7.0.14
- Avoid inaccurate sampling estimates for expressions that use
- Fixed a crash that occurred when
PERCENTILE_CONT()were used with a sub-select.
int96support to Parquet pipelines.
- When an
EXPLAINquery runs, return a warning when the query is run on tables containing zero rows.
- Fixed a crash that occurred during query rewrites for complex
DELETEqueries having subselects.
- Fixed an error that occurred when a plan was dropped from the plancache during a distributed join.
- Added clearer error messages when queries are unable to run due to workload manager or resource governor settings. Also, added better logging for these errors.
- Queries that are built from
TO_QUERY()and query type values are parameterized when they run. Parameterization of these queries allows plans to be reused in the plancache.
- Added support for pipelines that extract data from Google Cloud Storage (GCS).
- Fixed a locking issue that could cause dropping a temporary table to fail.
2020-03-09 Version 7.0.13
- Now, queries inside of stored procedures that use variables run as single partition queries if the queries only need to access data within single partitions. Previously, these types of queries did not run as single partition queries.
- Fixed an issue where
CLEAR ORPHAN DATABASESdid not handle table names that require escaping, such as table names prefixed with
ANALYZE TABLEincorporates the behavior of the
- Now, call the
malloc_trim()function less aggressively. This function shrinks the memory cached by libc malloc.
- Added the columns
information_schema.TABLES. These columns were previously returned when running
SHOW TABLES, only.
- Now, improve the performance of queries that use hash outer joins with some types of table filters.
2020-02-18 Version 7.0.12
- Fixed an incorrect “Transaction rolled back mid-query” error that could occur during a multi-statement transaction if the transaction was started on at least 64 partitions.
- Fixed an error that occurred when
SELECT ... GROUP BY ... ORDER BY RAND()was run on a columnstore table.
- Now, increase the number of columns that
SELECTqueries can return.
- Removed a limitation on the number of columns in a table allowed for columnstore sampling. Increased the number of keys allowed for optimized columnstore JSON storage in some cases.
- Fixed a deadlock that is possible when the
RESTORE DATABASEcommand fails and attempts to remove the restored database partitions that were created before the command failed.
- Now, cluster-wide operations that take a global lock, such as
REBALANCE, update the
information_schema.PROCESSLIST. The update indicates that a command is blocked on the lock.
2020-01-27 Version 7.0.11
- Added the
columnstore_sample_per_partition_limitengine variable. This variable controls the maximum number of rows sampled per partition for columnstores.
- Now, stop examining samples of columnstore tables that have many columns. Continue collecting samples for these types of tables.
- Fixed an issue with kerberos-related
CONFIGclause keys in
CREATE PIPELINE ...being considered invalid by HDFS pipelines; added the
allow_unknown_configsoption to such pipelines to skip
CONFIGclause validity checks.
- Now, periodically shrink the memory cached by libc malloc by calling
- Now, the
table_name_case_sensitivityengine variable can be used to change whether MemSQL treats tables, views, and table aliases as case-sensitive or case-insensitive. This variable can only be changed if no user databases have been created on the cluster.
- Fixed a crash that occurred when running
UPDATE ... SET ...which sets a sparse
VARCHARcolumn that had been previously assigned to earlier in the same
UPDATE ... SET ...query. Fixed the same issue that occurred when setting a sparse
- Fixed an issue where cross-database distributed joins could fail to cleanup the temporary result tables created to run the join.
2020-01-13 Version 7.0.10
- Now, the
REPLICATE DATABASEcommand requires a primary database and its secondary database to have the same durability setting (
- Fixed an issue that caused
STOP REPLICATINGto encounter an error after a database was upgraded from MemSQL 6.x to MemSQL 7.0.
LOAD DATAincorporates the behavior of the
- Fixed a crash that sometimes occurred when calling the
MAXfunction on an
- Now, HDFS Pipelines skip files in the data source that are under the
- Added the clause
NULL DEFINED BY ... OPTIONALLY ENCLOSEDto the syntax of
LOAD DATA .... When this clause is used,
NULLvalues enclosed with quotes as
- Now, snapshot reference databases after
ALTER TABLEis run. Also, snapshot all databases after
- Fixed a memory leak in queries that use full-text indexes.
- Fixed an issue with database replication that occurred when the
snapshots_to_keepengine variable is set to 1.
- Now, delete the existing samples associated with a table when columnstore sampling on the table is disabled. Also, reset the sampling flag on all segments associated with the table so they will be resampled if columnstore sampling is re-enabled.
2019-12-10 Version 7.0.9
- Initial GA release of MemSQL 7.0