Columnstore

MemSQL supports storing and processing data using two types of data stores, a completely in-memory rowstore and a disk-backed columnstore. The MemSQL rowstore and columnstores differ both in storage format (row, column) and in storage medium (RAM, disk - with flash or SSD recommended). MemSQL allows joining rowstore and columnstore data, thus allowing you to merge real-time and historical data together in a single query.

The rowstore is typically used for highly concurrent OLTP and mixed OLTP/analytical workloads. The columnstore is used primarily for analytical applications where the queries mainly involve aggregations over large datasets that do not fit in memory. In these cases the columnstore performs much better than the rowstore.

The decision framework for choosing a rowstore or a columnstore is included later in this section. In general, since rowstores support a larger variety of workloads, they are a good starting point.

This section provides a conceptual overview of MemSQL’s columnstore and includes considerations for optimizing your database performance using the columnstore.

Creating a Columnstore Table

The columnstore is enabled by adding a CLUSTERED COLUMNSTORE index to a table. If a clustered columnstore index is defined on a table, the table itself is physically stored using the columnstore. Currently, MemSQL supports one columnstore index per table. Combining columnstore indexes with other index types, except shard keys, is not currently supported.

Here is an example of a query that creates a columnstore table:

CREATE TABLE products (
     ProductId INT,
     Color VARCHAR(10),
     Price INT,
     dt DATETIME,
     KEY (`Price`) USING CLUSTERED COLUMNSTORE ,
     SHARD KEY (`dt`, `ProductId`)
);

As with any other MemSQL table, we define a SHARD KEY to explicitly control the data distribution. We define it on ProductId since sharding on a high cardinality identifier column generally allows for a more even distribution and prevents skew. It is also possible to randomly distribute data by either omitting the shard key, or defining an empty shard key SHARD KEY().

To create a columnstore table, you must include a CLUSTERED COLUMNSTORE key in the definition. Note that you can only have one CLUSTERED COLUMNSTORE key as well as the SHARD KEY. With the exception of metadata-only unenforced unique keys, you cannot add any other key, including PRIMARY KEY, to the columnstore table.

If a CLUSTERED COLUMNSTORE key is defined on a table, the rows of that table will be stored in a highly compressed columnar format on disk. We describe the details of the format below. While MemSQL can execute any query on a columnstore table that it can execute on a rowstore, some queries are more suitable for columnstore tables than others. Some queries that can benefit from using columnstore include:

Columnstore tables are also not constrained by the amount of available memory, unlike rowstore tables.

Conceptual Differences Between Row and Column Stores

Row-oriented stores, or “rowstores” are the most common type of data stores used by relational databases. As the name suggests, a rowstore treats each row as a unit and stores all fields for a given row together in the same physical location. This makes rowstores great for transactional workloads, where the database frequently selects, inserts, updates, and deletes individual rows, often referencing either most or all columns.

Column-oriented stores, or “columnstores” treat each column as a unit and stores segments of data for each column together in the same physical location. This enables two important capabilities. One is to scan each column individually, in essence, being able to scan only the columns needed for the query, with good cache locality during the scan. The other capability is that columnstores lend themselves well to compression. For example, repeating and similar values can easily be compressed together. A simplified example is shown here:

Logical table and rowstore representation

ProductId Color Price
1 Red 10
2 Red 20
3 Black 20
4 White 20

Columnstore

ProductId
1
2
3
4
Color
Red x 2
Black
White

“x 2” indicates that the value is repeated twice.

Price
10
20 x 3

“x 3” indicates that the value is repeated three times.

By physically organizing data by columns, the database is able to scan and aggregate data for individual columns very quickly, simply because less data needs to be scanned. This makes columnstores well-suited to analytical workloads. The trade-off of using the columnstore is that it does not lend itself well to manipulating individual rows quickly. Columnstores are usually batch-loaded for optimal performance, although small-batch and streaming insert to columnstores can also perform well. The combination of efficient batch loading and processing of analytic queries make columnstore highly suitable for data warehousing scenarios.

Choosing a Columnstore or Rowstore

The following table enumerates the strengths and intended uses of each storage engine.

In-Memory Rowstore Flash, SSD, or Disk-based Columnstore
Operational/transactional workloads Analytical workloads
Fast inserts and updates over a small or large number of rows Fast inserts over a small or large number of rows
Random seek performance Fast aggregations and table scans
Updates/deletes are frequent Updates/deletes are rare
Compression

How the MemSQL Columnstore Works

In MemSQL there are a few concepts that are important to understand in order to make the best possible use of the columnstore:

Below is an example of a MemSQL columnstore index on the Products table using the Price column as its key.

Info

Column segments typically contain on the order of hundreds of thousands of rows. In this example, the segment size is 5 rows for readability.

Logical table and rowstore representation

ProductId Color Price Qty
1 Red 10 2
2 Red 20 2
3 Black 20 2
4 White 30 2
5 Red 20 2
6 Black 10 2
7 White 25 2
8 Red 30 2
9 Black 50 2
10 White 15 2
11 Red 5 2
12 Red 20 2
13 Black 35 2
14 White 30 2
15 Red 4 2

Columnstore - Each row segment contains one column segment per column in the table. For simplicity, this example contains the initial load of the table and has a single sorted row segment group.

Sorted row segment group #1 of 1

Row segment #1 of 3 image

Row segment #2 of 3 image

Row segment #3 of 3

image

“x N” indicates that the value is repeated N times.

Creating Efficient Columnstore Queries

Queries against tables with a columnstore index in MemSQL can take advantage of five characteristics of a columnstore index:

  1. All queries will be able to take advantage of the fact that only the column segments containing columns referenced in the query will need to be scanned, as well as the fact that the columnstore’s compression causes less data to need to be scanned. Taking the above table as an example. The query SELECT SUM(Qty) FROM Products; will only need to scan the three Qty column segments, each of which only contain a single value due to the compression.
  2. Some queries can be performed simply by reading the in-memory metadata for column segments referenced in the query. As an example, the query SELECT COUNT(*) FROM Products; will only need to read the row count and delete bitmask for all row segments to produce its result, completely eliminating the need to read column segments from disk. Queries that use the MIN or MAX aggregates can eliminate the need to read column segments from disk if no delete has been performed against the column segment.
  3. Some queries can reduce the number of segments that need to be read from disk by eliminating these segments based on the segment metadata (min and max values for the segment). The efficiency of this method depends on what percentage of segments can actually be eliminated.
    • For queries that filter on the key columns of the columnstore index, segment elimination is typically very efficient as segments within each row segment will not cover overlapping value ranges. For example, in the above table the query SELECT AVG(Price), AVG(Qty) FROM Products WHERE Price BETWEEN 1 AND 10; will eliminate all segments except row segment: #1 column segment: Price 4-15 and row segment: #1 column segment: Qty 2-2 which will be scanned.
    • For queries that filter on the non-key columns that don’t correlate with the key columns, segment elimination can be much less valuable as segments within each row segment can cover overlapping value ranges. For example, in the above table, the query SELECT AVG(Price) FROM Products WHERE Color = 'Red'; will be forced to scan all segments for the Price and Color columns as no segment can be eliminated when the value Red is contained in all segments of the Color column.
  4. Queries that join tables on columns that are the index columns of a columnstore index can be performed very efficiently through the use of a merge join algorithm allowing the join to be performed by simply scanning two segments that need to be joined in lock-step.
  5. Certain types of data allow filters and group-by operations to be performed without decompressing data from its serialized-for-disk format. This greatly improves performance by reducing the amount of data that need to be processed, especially when the cardinalities of the involved columns are low. This optimization is only performed in cases when execution run time would be improved. See Understanding Operations on Encoded Data for more information.

Writing Columnstore Data

Unlike other columnstore implementations, MemSQL supports very fast, small-batch writes (such as single row inserts) directly into columnstore tables. This is implemented by storing newly written rows in a row-oriented skiplist before flushing them to the column-oriented format. This is all handled automatically behind the scenes, and rows are visible to reads as soon as they are committed to the row-oriented skiplist.

When using MemSQL columnstore indexes, the following are changes that can occur against the data store.

  1. Insert - Inserts into a columnstore index will either go into the rowstore-backed or create a new columnstore-backed row segment. This is automatically determined by the engine based on heuristics involving the size of the insert and the current state of the columnstore index. If an insert is large enough that it would write directly to the columnstore, it is an optimized insert. An optimized insert will load a set of data and pre-sort it in the order of the columnstore key column(s); this will cause the load to always create a single new row segment. By default, INSERT or LOAD DATA queries writing more than 16 MB per partition will produce an optimized insert.
  2. Delete - Deleting a row in a columnstore index causes the row to be marked as deleted in the segment metadata, leaving the data in place within the row segment. Segments which only contain deleted rows are removed, and the optimization process covered below will compact segments that require optimization.
  3. Update - An update in a columnstore index is internally performed as a delete followed by an insert within a transaction. As with inserts, small enough updates will write to the rowstore-backed row segment.
  4. Optimization - MemSQL has optimization routines that run automatically and can also be started manually. These routines attempt to automatically merge row segments together in order to improve query efficiency. DML statements can be performed while columnstore optimizations take place. For more information, see the OPTIMIZE TABLE topic.

Managing Columnstore Segments

A columnstore table will have the best performance if the rows in the table are in global sorted order across all the row segments. In reality, maintaining such an order is not feasible in the presence of continuous writes.

MemSQL uses an advanced algorithm that allows it to maintain the order as close to sorted as possible, while data is being ingested or updated. Such a process is called a background merger and is constantly running in the background if the order of the row segments can be improved.

Background merger is optimistic, in that if at any point it tries to move around data which is also being changed by a concurrent UPDATE or DELETE query, it will discard all the work it has done so far and start over. It works on a small chunk of data at a time, so it is always a relatively small amount of work that is being discarded. However, in the presence of a very heavy update workload it can be a significant slowdown compared to a pessimistic merger, which locks the row segments it is currently processing. A user can manually trigger a pessimistic merger by running an OPTIMIZE TABLE command. We will explain below how to decide whether such a command is necessary, and how to run it.

MemSQL uses a concept of a sorted row segment group to describe a set of row segments that are sorted together. Row segments form a sorted row segment group if and only if there is an order on the row segments such that for each row segment the smallest row in it is no smaller than the largest row in any row segment before it. Here and below when we say that one row is smaller than another row, we mean that the values of columns of the CLUSTERED COLUMNSTORE key of that row are smaller than those of the other row.

If the data had a perfect global order, it would consist of a single sorted row segment group. If the data is in a completely random order, it is likely to comprise as many sorted row segment groups as there are row segments. The goal of the background merger is to reorganize the rows among row segments in such a way that the number of sorted row segment groups is as small as possible.

To inspect the current state of the sorted row segment groups of a particular table, run the SHOW COLUMNAR MERGE STATUS FOR <table_name> command:

    memsql> SHOW COLUMNAR MERGE STATUS FOR groups;
    +- -----------------+- ------+- ------------------------+- ---------+- ----------+
    | Merger           | State | Plan                    | Progress | Partition |
    +- -----------------+- ------+- ------------------------+- ---------+- ----------+
    | (Current groups) | NULL  | 741,16,1                | NULL     |         0 |
    | (Current groups) | NULL  | 782,20                  | NULL     |         1 |
    | (Current groups) | NULL  | 701,40,5                | NULL     |         2 |
    | (Current groups) | NULL  | 326,207,123,37,21,19,17 | NULL     |         3 |
    +- -----------------+- ------+- ------------------------+- ---------+- ----------+

Let’s look closely at the first row of the result. According to it, the slice of the table that is stored on partition 0 has three sorted row segment groups, one consists of 741 row segments, one consists of 16 row segments, and one consists of a single row segment - a total of 758 row segments. Consider the impact of such a split into sorted row segment groups on a very simple query like

SELECT * FROM groups WHERE user_group = 15;

By the definition of the sorted row segment group, the very first sorted row segment group will have at most one row segment that contains rows with user_group equal to 15, unless user_group = 15 is on the boundary of two row segments, or if there is a large data skew and several row segments consist only of rows with user_group = 15. Similarly, at most one row segment in the second sorted row segment group contains relevant rows, and the only segment of the third sorted row segment group might also contain relevant rows. This way, only three row segments out of the total of 758 will be opened and materialized. While the query in this example is very simple, similar reasoning works for significantly more complex queries.

Now take a look at the sorted row segment groups on partition 3. Clearly, it is significantly less optimized than the remaining three, and a select query like the one shown above will result in materializing 8 row segments. If the background merger is enabled, and no workload is running concurrently, within several seconds this partition would get optimized. However, in the presence of a heavy workload, the optimistic background merger might fall behind. In this case, it might be reasonable to manually trigger a pessimistic merger by calling:

    memsql> OPTIMIZE TABLE groups
    Empty set (32.36 sec)

If we run SHOW COLUMNAR MERGE STATUS while OPTIMIZE TABLE is being executed, we might see the manual merger in action:

    memsql> SHOW COLUMNAR MERGE STATUS FOR groups;
    +- -----------------+- ---------+- ------------------------+- ---------+- ----------+
    | Merger           | State    | Plan                    | Progress | Partition |
    +- -----------------+- ---------+- ------------------------+- ---------+- ----------+
    | (Current groups) | NULL     | 741,16,1                | NULL     |         0 |
    | (Current groups) | NULL     | 782,20                  | NULL     |         1 |
    | (Current groups) | NULL     | 701,40,5                | NULL     |         2 |
    | (Current groups) | NULL     | 326,207,123,37,21,19,17 | NULL     |         3 |
    | Manual Merger    | Working  | 326+207+123+37+21+19+17 | 53.12%   |         3 |
    +- -----------------+- ------+- ---------------------------+- ---------+- ----------+

What this new row indicates is that there is a manual merger running on the partition 3, and that at this time it has done 53.12% of the work.

When the merger is done, the table now is in a better shape:

    memsql> SHOW COLUMNAR MERGE STATUS FOR groups;
    +- -----------------+- ------+- ---------+- ---------+- ----------+
    | Merger           | State | Plan     | Progress | Partition |
    +- -----------------+- ------+- ---------+- ---------+- ----------+
    | (Current groups) | NULL  | 741,16,1 | NULL     |         0 |
    | (Current groups) | NULL  | 782,20   | NULL     |         1 |
    | (Current groups) | NULL  | 701,40,5 | NULL     |         2 |
    | (Current groups) | NULL  | 730,20   | NULL     |         3 |
    +- -----------------+- ------+- ---------+- ---------+- ----------+

Note that at no point were any of the partitions merged into a single sorted row segment group in this example. The reason for that is that both optimistic and pessimistic merger use an advanced algortihm that is optimized to do small amortized chunks of work in the presence of concurrent writes and maintain data in a few sorted row segment group, rather than to attempt to merge all the data into a single sorted row segment group. In cases when it is acceptable to sacrifice some time on data ingestion to achieve even higher SELECT performance, it is possible to run a manual command that merges data on each partition into a single sorted row segment group:

    memsql> OPTIMIZE TABLE groups FULL
    Empty set (57.36 sec)

    memsql> SHOW COLUMNAR MERGE STATUS FOR groups;
    +- -----------------+- ------+- -----+- ---------+- ----------+
    | Merger           | State | Plan | Progress | Partition |
    +- -----------------+- ------+- -----+- ---------+- ----------+
    | (Current groups) | NULL  | 758  | NULL     |         0 |
    | (Current groups) | NULL  | 802  | NULL     |         1 |
    | (Current groups) | NULL  | 746  | NULL     |         2 |
    | (Current groups) | NULL  | 750  | NULL     |         3 |
    +- -----------------+- ------+- -----+- ---------+- ----------+

At this time any highly selective select will materialize only one row segment per partition.

Info

Unlike OPTIMIZE TABLE <name>, which takes amortized time proportional to the size of recently loaded data, OPTIMIZE TABLE <name> FULL always takes time in the order of magnitude of the size of the entire table, unless data in that table is already sorted.

When inserting a small amount of rows into the columnstore table, an in-memory rowstore-backed segment is used to store the rows. As this rowstore-backed segment fills, the background flusher periodically will flush these rows to disk. A rowstore-backed segment can be flushed to disk manually by running OPTIMIZE TABLE <table_name> FLUSH.

    memsql> OPTIMIZE TABLE t FLUSH;

Advanced Columnstore Configuration Options

Warning
Configuration options described in this section may have significant performance impact on the workload. Make sure to test on a staging environment first before deploying the configuration changes.

Configuring segment size in columnstore tables

In MemSQL, data in a columnstore table are organized into multiple row segments. For certain workloads, the size of row segments significantly affects performance. Considerations include:

In MemSQL, the default size of the row segments is controlled by the global variable columnstore_segment_rows. By default, columnstore_segment_rows has a value of 1024000, meaning that each segment contains 1024000 rows by default.

Info
  • The variable columnar_segment_rows from previous MemSQL versions was deprecated in favor of columnstore_segment_rows. columnar_segment_rows exists as an alias to columnstore_segment_rows.

  • A cluster upgraded from a version before 6.0 will have the previous default value of 102400 for columnstore_segment_rows. See Upgrading MemSQL to 6.0 for more information.

In addition to the global variable, it is possible to overwrite the global setting and set the segment size for a specific columnstore table. This can be done during the table creation, or by altering an existing table. For example:

memsql> CREATE TABLE t (id INT, KEY (id) USING CLUSTERED COLUMNSTORE WITH (columnstore_segment_rows=100000));

memsql> SHOW CREATE TABLE t;
+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                  |
+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| t     | CREATE TABLE `t` (
  `id` int(11) DEFAULT NULL,
  KEY `id` (`id`) /*!90619 USING CLUSTERED COLUMNSTORE */ /*!90621 WITH(COLUMNSTORE_SEGMENT_ROWS=100000) */
) |
+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+

Now, modify the value of columnstore_segment_rows for this columnstore. The MODIFY KEY keyName clause identifies the index whose settings are to be changed (in this case, the columnstore index).

memsql> ALTER TABLE t MODIFY KEY id SET (columnstore_segment_rows=20000);

memsql> SHOW CREATE TABLE t;
+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                 |
+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| t     | CREATE TABLE `t` (
  `id` int(11) DEFAULT NULL,
  KEY `id` (`id`) /*!90619 USING CLUSTERED COLUMNSTORE */ /*!90621 WITH(COLUMNSTORE_SEGMENT_ROWS=20000) */
) |
+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+

Configuring the rowstore-backed segment size in columnstore tables

In addition to the on-disk segments, each MemSQL columnstore also has an in-memory rowstore-backed segment. Small-batch inserts write to the rowstore-backed segment first, while the background flusher process periodically compresses those recently inserted rows and creates on-disk segments.

The size of the rowstore-backed segment is controlled by the global variable columnstore_flush_bytes. The background flusher process starts to create on-disk segments when the amount of data in the rowstore-backed segment exceeds columnstore_flush_bytes, with a default of 32 MB. Additionally, insert and load operations are considered small-batch if they write to each partition less than columnstore_flush_bytes * columnstore_disk_insert_threshold, which is by default 16 MB at a time.

The columnstore_disk_insert_threshold value is a fractional value with a default of 0.5. It controls the minimum size of a disk-backed row segment created by insert and load operations. Note that if OPTIMIZE TABLE FLUSH is manually run, the minimum segment size can be much smaller.

Considerations about tuning the rowstore-backed segment size include:

Similar to columnstore_segment_rows, the columnstore_flush_bytes value can also be configured per-table with the following syntax:

memsql> CREATE TABLE t2 (id INT, KEY (id) USING CLUSTERED COLUMNSTORE WITH (columnstore_flush_bytes=4194304, columnstore_segment_rows=100000));

memsql> ALTER TABLE t2 MODIFY KEY id SET (columnstore_flush_bytes=8388608);

Prefetching

When a query matches multiple segments which are not currently cached in memory, it can benefit from prefetching. Prefetching improves performance on these types of queries by requesting the file from disk for the next segment while processing the rows of the current segment. Prefetching is enabled by default for all MemSQL columnstore tables.

System Recommendations for Optimal Columnstore Performance

See the Columnstore Recommendations section of Installation Best Practices.

Query Processing Operations on Encoded Data

MemSQL performs some query processing operations directly on encoded data in columnstore tables. Columnstore data is stored encoded in a number of different forms, including dictionary encoding, run-length encoding, and value encoding. For these encodings, certain operations, including filtering and aggregation, can be run very efficiently, through the use of special processing techniques, as well as Single-Instruction, Multiple-Data (SIMD) instructions on processors that support the Intel AVX2 instruction set.

For more information and examples of these operations, see Understanding Operations on Encoded Data.

Was this article useful?