6.5 Beta Release Notes
The MemSQL 6.5 Beta releases include new features and functionality for you experiment with in your test environments. Functionality may change between now and the next major release of MemSQL.
See the descriptions and changelog below for more information on these new features.
HDFS Pipeline Support
A new HDFS pipeline allows you to perform ETL operations on files from Hadoop, Pig, or Spark jobs. See HDFS Pipelines Overview for more details.
Kerberos and SSL support for Kafka Pipelines
You can now configure Kafka pipelines to connect to your Kafka brokers through SSL and optionally authenticate with Kerberos. See Enabling SSL and Kerberos on Kafka Pipelines for more details.
Pipeline Support with Stored Procedures
In previous versions of MemSQL, you could only insert into one table for each pipeline that you created. Now in MemSQL 6.5 and later, you can insert into multiple tables from one pipeline by specifying a stored procedure in a
CREATE PIPELINE command.
Having pipeline support with stored procedures also means supporting new scenarios such as data transformation in SQL. See the INTO PROCEDURE section of
CREATE PIPELINE for more details.
Performance improvements for loading into S3 Pipelines
S3 pipelines now have the ability to process new files in a bucket at a much faster rate without incurring the penalty of listing all of the files in the bucket before performing the ETL operations of the pipeline. Previously, this could be a slow operation if you had a bucket with a large file count.
The only requirement to enable this new functionality is to prefix filenames in your bucket with an increasing alpha-numeric value, such as a timestamp or some other marker (e.g.
YYYY-MM-DD-filename.extension). You do not need to add any configuration elements to your
CREATE PIPELINE statement.
Backup and Restore from S3
- Performance improved for columnstore compression
- Support for using .tar archives for columnstore backups
- Ability to debug and test out transforms by running new EXTRACT PIPELINE INTO OUTFILE command
- New PROFILE PIPELINE command useful for debugging pipeline bottleneck issues
- Performance improvements for some of the pipelines and columnar information_schema tables
- Improved performance for
LOAD DATAwhen loading CSVs that contain large numbers of columns
- Added “dry run” option for
- The new BACKUP_HISTORY table provides important metadata on recent successful backups.
- The new LMV_EVENTS table provides a history of cluster-level events such as nodes attaching to a cluster or the rebalancing of partitions
- The new LOAD_DATA_STATUS table tells you how much data has been ingested during a
Query Language Additions
FULL OUTER JOIN now supported
See SELECT for more details on usage.
You can now transform non-aggregated event data into a pivot table output format using the
PIVOT clause in a
SELECT statement. See PIVOT for more details.
Full Text Search
Full text search allows searching for words or phrases in columnstore table columns with a large body of text. See Full Text Search for more details.
- Integer run-length encoding is now supported for encoded filters. See Data Encodings Supported for more details.
- Improved correlated subselect support by removing restrictions and allowing dependent fields in the left expression of an
- Supports inserting into table with computed column as shard key and
ignore_insert_into_computed_column = ON.
- New data conversion functionality that throws errors for integer under/overflow and string truncation issues. Controlled through the new system variable
data_conversion_compatibility_level. See System Variables for more information.
- Improved performance for queries that move large amounts of data between nodes
- Improved performance of
GROUP BYwith many groups
- Improved performance for queries that use windowed aggregate functions combined with filters
- Improved histograms and cardinality estimation
- Improved selection of encoded
- Improved cardinality estimation for
- Support for multiple result sets from a single stored procedure. See CREATE PROCEDURE for more details.
- MemSQL now supports certain DDL statements inside stored procedures. See CREATE PROCEDURE for more details.
SSL for cross-cluster communication decoupled from SSL for intra-cluster communication
You can now configure your cluster to use SSL for communication between clusters during replication only, and have it off for local communication between nodes. This can be useful if the performance cost of securing intra-cluster communication is too high for your workload. See Server Configuration for Secure Client and Intra-Cluster Connections for more details.
Memory limits can now be set to prevent unintended queries from consuming all available query execution memory in the cluster. See Setting Resource Limits for more details.
MemSQL now estimates the amount of memory required to execute queries and only runs those queries if sufficient memory is available. See Workload Management for more information.
Synchronize User Permissions Across the Cluster
You can now set and update cluster-wide user permissions on the master aggregator and have those changes synchronized to all nodes in your cluster by setting
ON in your master aggregator.
AUTO_INCREMENTbehavior during cluster restarts
Maintenance Release Changelog
2018-06-15 Version 6.5.6-beta3
- Reduced number of threads used in S3 backup and restore operation.
- Reduced stack space usage for certain data structures, which lowers the probability of encountering stack overflows when executing user-defined functions (UDFs) and stored procedures.
- Improved handling of
2018-06-06 Version 6.5.5-beta3
- Offline upgrade to MemSQL 6.5 is now supported. See Upgrading to MemSQL 6.5 for more details.
2018-05-17 Version 6.5.4-beta2
- Extensive list of improvements and new features. See the descriptions above for more information.
2018-03-05 Version 6.5.1-beta1
- New full text search functionality now available in beta.