MemSQL supports storing and processing data using two types of data stores: a completely in-memory rowstore and a disk-backed columnstore. The MemSQL rowstore and columnstores differ both in storage format (row vs. column) and in storage medium (RAM vs. disk or SSD). MemSQL allows querying rowstore and columnstore data together in the same query.
The rowstore is typically used for highly concurrent online transaction processing (OLTP) and hybrid transactional/analytical (HTAP) workloads with fast row lookup requirements. The columnstore is used primarily for analytical and HTAP applications where the queries often reference large datasets that do not fit in memory.
The total combined size of all rowstore tables is limited by the total available RAM on the leaf nodes in the cluster. It is important to reserve a reasonable amount of RAM, say 20%, for query execution. So the effective total rowstore table capacity will be reduced by this amount. See Memory Management and Identifying and Reducing Memory Usage for more details on configuring memory limits and troubleshooting out-of-memory conditions.
For applications that do many point lookups and small inserts, updates, and deletes, rowstore performs much better than columnstore. In general, since rowstores support a larger variety of workloads, they are a good starting point.
This section provides a conceptual overview of MemSQL’s rowstore and includes considerations for optimizing your database performance using the rowstore.
Creating a Rowstore Table
Rowstore is the default table storage format. Typically you will specify a shard key and one or more indexes for a rowstore, although a shard key and indexes are optional. A primary key, which is enforced to be unique, is also supported.
Here is an example of a statement that creates a rowstore table:
CREATE TABLE products ( ProductId INT, Color VARCHAR(10), Price INT, dt DATETIME, KEY (Price), SHARD KEY (ProductId) );
We define a
SHARD KEY to explicitly control the data distribution. We
define it on
ProductId since sharding on a high cardinality
column or columns generally allows for a more even distribution and
prevents skew. The KEY specified on
Price causes an index to be created on
It is also possible to randomly distribute data by
either omitting the shard key, or defining an empty shard key
SHARD KEY(), as long as no primary key is defined.
Rowstores with multiple indexes support very fast seeking to find one or a small number of rows via several different keys, or “access paths.” MemSQL rowstores are able to provide extremely fast lookup along multiple different access paths with low variance in response time from query to query.
The following statement creates a
products table with keys on both
Color, and a unique (primary) key on
CREATE TABLE products ( ProductId INT, Color VARCHAR(10), Price INT, dt DATETIME, KEY (Price), KEY (Color), PRIMARY KEY(ProductId), SHARD KEY (ProductId) );
Primary keys must contain all the columns in the shard key so they can be enforced efficiently by looking at data in only a single shard (partition). Creating a primary key without specifying a shard key automatically shards on the primary key.
You can also create a key by creating an index on a rowstore using the CREATE INDEX statement.
Rowstore data is fully persistent and updates on rowstore tables are done in transactions. Persistence is implemented for rowstores using periodic snapshots of the in-memory data, and a write-ahead log, which are both stored on a file system to make them permanent. If a MemSQL node restarts, all its rowstore data will be recovered from the snapshots and log and the in-memory state for the rowstore will be rebuilt.
Porting applications to MemSQL rowstores
If you are porting an application from another database that uses a disk-based rowstore structure based on B-tree, ISAM or a similar access method, you should start by using a MemSQL rowstore with a key for each index on the original system. If the original system had a “heap” access method (unordered rowstore), you can start with a MemSQL rowstore with no key.
OLTP operations on large data sets
For OLTP operations on large tables, the total cost of provisioning enough RAM, and possibly enough servers, when using rowstores, can become a significant concern for some users. If this is the case, and the application doesn’t need the very fastest row lookup time, consider using columnstore tables with hash indexes on the most frequently-used lookup key columns instead. Or, for wide tables with many null values, the total cost of ownership (TCO) can be reduced by using rowstore tables with SPARSE data compression.
For more information on columnstore tables and how to decide whether to use a rowstore or columnstore, see Columnstore.
See The Story Behind MemSQL’s Skiplist Indexes for details about how standard rowstore table indexes are implemented in MemSQL.
See MemSQL SingleStore – And Then There Was One for additional information about (1) using columnstores with hash indexes, sub-segment access, and fine-grain locking to enable OLTP operations on data bigger than will fit in RAM, and (2) using SPARSE rowstore compression to reduce TCO for rowstore tables with many NULL values.