Choosing a Table Storage Type
The following table enumerates the strengths and intended uses of the rowstore and columnstore.
|In-Memory Rowstore||Flash, SSD, or Disk-based Columnstore|
|Operational/transactional workloads||Analytical workloads|
|Fast inserts and updates over a small or large number of rows||Fast inserts over a small or large number of rows|
|Random seek performance||Fast aggregations and table scans|
|Updates/deletes are frequent||Updates/deletes are rare|
Conceptual Differences Between Row and ColumnStores
Row-oriented stores, or “rowstores” are the most common type of data stores used by relational databases. As the name suggests, a rowstore treats each row as a unit and stores all fields for a given row together in the same physical location. This makes rowstores great for transactional workloads, where the database frequently selects, inserts, updates, and deletes individual rows, often referencing either most or all columns.
Column-oriented stores, or “columnstores” treat each column as a unit and stores segments of data for each column together in the same physical location. This enables two important capabilities. One is to scan each column individually, in essence, being able to scan only the columns needed for the query, with good cache locality during the scan. The other capability is that columnstores lend themselves well to compression. For example, repeating and similar values can easily be compressed together. A simplified example is shown here:
Logical table and rowstore representation
|Red x 2|
“x 2” indicates that the value is repeated twice.
|20 x 3|
“x 3” indicates that the value is repeated three times.
By physically organizing data by columns, the database is able to scan and aggregate data for individual columns very quickly, simply because less data needs to be scanned. This makes columnstores well-suited to analytical workloads. The trade-off of using the columnstore is that it does not lend itself well to manipulating individual rows quickly. Columnstores are usually batch-loaded for optimal performance, although small-batch and streaming insert to columnstores can also perform well. The combination of efficient batch loading and processing of analytic queries make columnstore highly suitable for data warehousing scenarios.
- Training: MemSQL Storage