Data Types

MemSQL supports the following data types. The sizes given are for in-memory rowstore tables. Columnstore (on-disk) tables are stored differently, and often compressed.

Integer Numbers

Data Type Size Size (Not Null) Synonyms Min Value Max Value
BOOL* (see note below) 2 bytes 1 byte BOOLEAN -128 127
BIT 9 bytes 8 bytes
TINYINT 2 bytes 1 byte -128 127
SMALLINT 4 bytes 2 bytes -32768 32767
MEDIUMINT 4 bytes 3 bytes -8388608 8388607
INT 8 bytes 4 bytes INTEGER -2147483648 2147483647
BIGINT 12 bytes 8 bytes -2 ** 63 (2 ** 63) - 1

Note:

  • BOOL and BOOLEAN are synonymous with TINYINT. A value of 0 is considered FALSE, non-zero values are considered TRUE.
Info

Each integer type can be “unsigned”, doubling the positive range of values and disallowing values under 0.

Real Numbers

Data Type Size Size (Not Null) Synonyms Precision
DOUBLE 8 bytes 8 bytes REAL 24 to 53 bits
FLOAT 4 bytes 4 bytes 23 bits
DECIMAL see note see note DEC, FIXED, NUMERIC 65 digits
Info

The space consumed by the DECIMAL datatype varies with the precision and scale arguments. The syntax for defining a DECIMAL (or its synonyms DEC, FIXED, and NUMERIC) is in the form DECIMAL(P, S) where P is the total number of digits (precision) and S (scale) is how many of those digits appear after the decimal point. For example, DECIMAL(10,5) will store numbers with 5 digits before the decimal and 5 after.

The formula is ceil(P / 9) * 4. In our example, it would be ceil(10 / 9) * 4, or 8 bytes.

Time and Date

Data Type Size Size (Not Null) Resolution Min Value Max Value
DATE 8 bytes 4 bytes day 1000-01-01 9999-12-31
TIME 8 bytes 4 bytes second -838:59:59 838:59:59
TIMESTAMP 4 bytes 4 bytes second 1970-01-01 00:00:01 2038-01-19 03:14:07
TIMESTAMP(6) 8 bytes 8 bytes microsecond 1970-01-01 00:00:01.000000 2038-01-19 03:14:07.999999
DATETIME 12 bytes 8 bytes second 1000-01-01 00:00:00 9999-12-31 23:59:59
DATETIME(6) 12 bytes 8 bytes microsecond 1000-01-01 00:00:00.000000 9999-12-31 23:59:59.999999
YEAR 2 bytes 1 byte year 1901 2155

Inserting or Updating a Column With the Current Timestamp

To allow inserts or updates of a TIMESTAMP, TIMESTAMP(6), DATETIME , or a DATETIME(6) column with the current timestamp, define the column with the clause:

  • DEFAULT <timefunction>, to insert a column with the current timestamp.

  • ON UPDATE <timefunction>, to update a column with the current timestamp.

  • DEFAULT <timefunction> ON UPDATE <timefunction>, to insert and update a column with the current timestamp.

where <timefunction> is one of the following values: NOW(), NOW(6), CURRENT_TIMESTAMP(), CURRENT_TIMESTAMP(6).

Note: It is recommended to use DATETIME(6) column type instead of TIMESTAMP(6) because the upper limit of TIMESTAMP(6) is 2038-01-19 as opposed to 9999-12-31 for DATETIME(6).

Examples

The following example demonstrates the use of DEFAULT CURRENT_TIMESTAMP() in column definition:

CREATE TABLE s(id INT, ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP());
INSERT INTO s(id) VALUES(1);

INSERT INTO s(id) VALUES(2);

SELECT * FROM s;
****
+------+---------------------+
| id   | ts                  |
+------+---------------------+
|    1 | 2020-05-21 09:12:25 |
|    2 | 2020-05-21 09:12:25 |
+------+---------------------+

The following example demonstrates the use of ON UPDATE CURRENT_TIMESTAMP(6) in column definition:

CREATE TABLE s1(id INT, ts DATETIME(6) ON UPDATE CURRENT_TIMESTAMP(6));

INSERT INTO s1(id,ts) VALUES(1,"2017-07-01");

SELECT * FROM s1;
****
+------+----------------------------+
| id   | ts                         |
+------+----------------------------+
|    1 | 2017-07-01 00:00:00.000000 |
+------+----------------------------+

UPDATE s1 SET id = 2 WHERE id = 1;

SELECT * FROM s1;
****
+------+----------------------------+
| id   | ts                         |
+------+----------------------------+
|    2 | 2020-05-21 09:36:47.124979 |
+------+----------------------------+

The following example demonstrates the use of DEFAULT NOW(6) ON UPDATE NOW(6) in column definition:

CREATE TABLE s2(id INT, ts DATETIME(6) DEFAULT NOW(6) ON UPDATE NOW(6));

INSERT INTO s2(id) VALUES(1);

INSERT INTO s2(id) VALUES(2);

SELECT * FROM s2;
****
+------+----------------------------+
| id   | ts                         |
+------+----------------------------+
|    2 | 2020-05-21 11:21:30.998275 |
|    1 | 2020-05-21 11:21:30.992868 |
+------+----------------------------+

UPDATE s2 SET id = 2 WHERE id = 1;

SELECT * FROM s2;
****
+------+----------------------------+
| id   | ts                         |
+------+----------------------------+
|    2 | 2020-05-21 11:21:30.998275 |
|    2 | 2020-05-21 11:23:39.134793 |
+------+----------------------------+
Warning

Prior to MemSQL 6, when you inserted or updated a record, the first TIMESTAMP or TIMESTAMP(6) column would automatically be populated with the current timestamp. Such columns did not use DEFAULT CURRENT_TIMESTAMP() or ON UPDATE CURRENT_TIMESTAMP() clause in their column definitions. In MemSQL 6 and later, this behavior has been deprecated; TIMESTAMP or TIMESTAMP(6) columns without these clauses will be populated with a NULL value instead of the current timestamp. For instance, NULL values are populated in the ts column in the following example, when run on MemSQL 6 or later.

memsql> CREATE TABLE r(d DATETIME(6), ts TIMESTAMP(6), id INT);

memsql> INSERT r(d, id) VALUES("2017-07-01", 1);

memsql> SELECT * FROM r;
+----------------------------+----------------------------+------+
| d                          | ts                         | id   |
+----------------------------+----------------------------+------+
| 2017-07-01 00:00:00.000000 | NULL                       |    1 |
+----------------------------+----------------------------+------+

memsql> UPDATE r SET id = 2 WHERE id = 1;

memsql> SELECT * FROM r;
+----------------------------+----------------------------+------+
| d                          | ts                         | id   |
+----------------------------+----------------------------+------+
| 2017-07-01 00:00:00.000000 | NULL                       |    2 |
+----------------------------+----------------------------+------+

The explicit_defaults_for_timestamp Variable

The variable explicit_defaults_for_timestamp determines if certain nonstandard behaviors for default values and NULL-value are disabled for the TIMESTAMP column. By default, the variable is enabled, which disables the nonstandard behaviors.

If the explicit_defaults_for_timestamp variable is enabled, MemSQL handles the TIMESTAMP column behavior as follows:

  • Assigning a TIMESTAMP column value of NULL does not set it to the current timestamp. To set to the current timestamp, set the TIMESTAMP column to CURRENT_TIMESTAMP() or NOW().

  • If the TIMESTAMP column is not declared as NOT NULL, then it is automatically declared with the NULL attribute and permits NULL values. Assigning a value of NULL to the column sets it to NULL, not the current timestamp.

  • TIMESTAMP column declared with the NOT NULL attribute do not permit NULL values.

  • TIMESTAMP columns explicitly declared with the NOT NULL attribute and without an explicit DEFAULT attribute are treated as having no default value.

  • To declare TIMESTAMP column with the DEFAULT CURRENT_TIMESTAMP() or ON UPDATE CURRENT_TIMESTAMP() attributes, these must be explicitly specified.

  • The first TIMESTAMP column in a table is handled in the same way as the TIMESTAMP columns following the first one.

If the explicit_defaults_for_timestamp variable is disabled, MemSQL handles the TIMESTAMP column behavior as follows:

  • If the TIMESTAMP column is not declared as NULL, then it is automatically declared with the NOT NULL attribute. Assigning a value of NULL to the column is permitted and sets the column to the current timestamp.

  • The first TIMESTAMP column in a table, if not declared as NULL, DEFAULT CURRENT_TIMESTAMP() or ON UPDATE CURRENT_TIMESTAMP(), is automatically declared with the DEFAULT CURRENT_TIMESTAMP() and ON UPDATE CURRENT_TIMESTAMP() attributes.

  • TIMESTAMP columns following the first one, if not declared with the NULL attribute or a DEFAULT attribute, are automatically declared as DEFAULT ‘0000-00-00 00:00:00’ (the “zero” timestamp).

String Types

Fixed-Length String

There are three fixed-size string types in MemSQL. The space consumed is directly proportional to their length definitions. Values of type CHAR require 3 x LENGTH bytes of memory, as MemSQL supports only the UTF8 character set. Values of type BINARY require LENGTH bytes of memory.

Data Type Size Size (Not Null)
CHAR(LENGTH) 3*LENGTH + 4 bytes 3*LENGTH bytes
BINARY(LENGTH) LENGTH + 4 bytes LENGTH bytes

A column declared as CHAR() stores values as per the defined length. When CHAR values are stored, they are right-padded with spaces to the specified length but when the CHAR values are retrieved, trailing spaces are removed.

CREATE TABLE t1 (c1 CHAR(3));
INSERT INTO t1 VALUES ('  ');
INSERT INTO t1 VALUES ('ab ');
INSERT INTO t1 VALUES ('abcd');
SELECT c1, LENGTH (c1) FROM t1;
****
+------+-------------+
| c1   | LENGTH (c1) |
+------+-------------+
| abc  |           3 |
| ab   |           2 |
|      |           0 |
+------+-------------+

Variable-Length Strings

There are several variable-length string types. The data for these variable types are stored in a memory allocation designed for them, and pointers to their locations in memory are stored in the main table structure. This means that the space consumed by a particular value is the byte length of the string, plus an 8-byte overhead for the pointer, and a 4-byte overhead for storing the length. (There is an additional 4-byte overhead if the field is nullable.) One exception is values less than 7 bytes long. Those small values are stored directly in the pointer area.

The three main types are VARCHAR, VARBINARY, and LONGTEXT. While they all operate similarly for most English-language alphanumeric strings, there are several important differences in their behavior. The length limit of the VARCHAR is measured in Unicode characters, not in bytes. VARBINARY fields store data verbatim, regardless of whether they are valid Unicode strings. Sorting and comparing a VARBINARY uses the exact binary representation.

Data Type Max Length
VARCHAR 21,845 characters
VARBINARY system RAM
LONGTEXT 4 GB
Info

There are several variants of the BLOB and TEXT types, listed below. Aside from their maximum lengths, there is no practical difference between these types in MemSQL, including performance or memory efficiency. They are implemented solely for backwards compatibility with MySQL syntax.

Name Max Length Description
LONGBLOB 4 GB VARBINARY
MEDIUMBLOB 16 MB VARBINARY
BLOB 65,535 bytes VARBINARY
TINYBLOB 255 bytes VARBINARY
MEDIUMTEXT 16 MB LONGTEXT
TEXT 65,535 bytes LONGTEXT
TINYTEXT 255 bytes LONGTEXT

JSON Type

The JSON type can contain any valid Javascript Object Notation data, including maps, arrays, and nested types. For more information see the JSON Guide .

Data Type Size Size (Not Null)
JSON 20 bytes + data size 16 bytes + data size

Geospatial Types

Geospatial types contain information about points, paths, and polygons on the surface of the Earth. For more information, see the Geospatial Guide.

Data Type Size Size (Not Null)
GEOGRAPHYPOINT 8 bytes 8 bytes
GEOGRAPHY See note
Info

The space consumed by the path and polygon data in Geography columns is dependent on the complexity of the objects. Very roughly, a Geography value consumes 24 bytes per vertex, plus a 64 byte overhead, plus some minor overhead for each “ring” in a multipolygon.

Indexing a Geography object may also consume more space than indexes over other datatypes. The rule of thumb is 40 bytes * the value of the RESOLUTION parameter of the index, per row.

MemSQL Procedural SQL (MPSQL) Types

Two data types are only supported in MemSQL Procedural SQL (MPSQL) scalar-valued functions and stored procedures: ARRAY and RECORD. The QUERY data type is only supported in stored procedures. For more information about these types, see the ARRAY, RECORD, and QUERY topics.

Other Types

Data Type Size Size (Not Null) Max cardinality Max values per row
ENUM 4 bytes 2 bytes 65,535 unique values 1
SET 12 bytes 8 bytes 64 unique values 64

Comparing mismatched datatypes

Using comparisons between mismatched datatypes may degrade query performance and may use unsafe type conversions which can yield undesirable query results. MemSQL will display a warning for queries with potentially problematic comparisons between mismatched datatypes in the EXPLAIN and information_schema records for the query.

These warnings do not necessarily indicate a problem, and you may have valid reasons for comparing different datatypes. But these type mismatches are flagged to help you identify potential problems that you may not be aware of.

When you see these warnings, you should check whether the datatypes are expected to be different. You may wish to consider changing the datatypes of the fields or values involved. You may also wish to consider adding explicit type conversion operations, such as the cast operators or functions like STR_TO_DATE which convert between different types.

These warnings can be seen in EXPLAIN <query>, information_schema.plancache.plan_warnings, information_schema.mv_queries.plan_warnings, and the PlanWarnings column of show plancache in textual form, as well as in EXPLAIN JSON <query>, information_schema.plancache.plan_info, and information_schema.mv_queries.plan_info in JSON form.

Example

An example of these warnings is the following:

CREATE TABLE t (id VARCHAR(50), PRIMARY KEY (id));

EXPLAIN SELECT * FROM t WHERE id = 123;
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                                                                |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| WARNING: Comparisons between mismatched datatypes which may involve unsafe datatype conversions and/or degrade performance. Consider changing the datatypes, or adding explicit typecasts. See https://docs.memsql.com/docs/mismatched-datatypes for more information. |
|                                                                                                                                                                                                                                                                        |
| WARNING: Comparison between mismatched datatypes: (`t`.`id` = 123). Types 'varchar(50) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL' vs 'bigint(20) NOT NULL'.                                                                                                  |
|                                                                                                                                                                                                                                                                        |
| Gather partitions:all alias:remote_0                                                                                                                                                                                                                                   |
| Project [t.id]                                                                                                                                                                                                                                                         |
| Filter [t.id = 123]                                                                                                                                                                                                                                                    |
| TableScan db.t, PRIMARY KEY (id)                                                                                                                                                                                                                                       |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

In this example, the query involves the expression t.i = 123, where t.i is a string field. When a numeric value is compared to a string, the string is converted to a numeric value, e.g. the string '123' is converted to the number 123.

This comparison may be problematic for multiple reasons:

  • The mismatched datatypes may indicate a mistake in how the query is written or how the table is defined. The query may behave differently than intended and yield undesired results. For example, all of the strings '123', '123.0', '0123', and '123abc' compare equal to the integer 123, so while the query may be intended to retrieve a single row with the specified id, this equality comparison may actually yield multiple rows whose id values all compare equal to 123.

  • The mismatched datatypes negatively impact performance of the query. If the field and constant were either both strings or both integers, the query plan would be able to use the index to efficiently seek to lookup the matching id. But because there are actually many possible string ids that match the number 123, which do not come in any particular order in terms of string lexicographic order, the query cannot seek into the index and instead must scan all the rows of the table.

data_conversion_compatibility_level

Changing the value of data_conversion_compatibility_level sync variable can change the behavior of expressions in computed columns. This variable can have the following possible values: '6.0', '6.5', '7.0'.

If a computed column value changes due to a change in data_conversion_compatibility_level, columnstore sorting, indexes, and sharding can become logically corrupted. MemSQL does not recompile an existing plan when data_conversion_compatibility_level or sql_mode changes.

Info

sql_mode is persisted to the CREATE TABLE statement. Therefore, the computed column of a table always uses the sql_mode that the table is created with, which may be different from the current sql_mode.

Best practices to avoid data corruption

  • Review the computed column expressions when changing data_conversion_compatibility_level.
  • Change data_conversion_compatibility_level only as a part of an application upgrade process.
  • Perform application upgrade tests.

For example, if a computed column contains corrupted data and you have to switch to a higher data_conversion_compatibility_level, you may need to drop the computed column before changing the level. Once the level is increased, add the computed column to the table. Alternatively, if a persisted computed column is used in a columnstore key, you may have to create a new table and select data into this new table. After copying the data, drop the old table and rename the new table.

The following examples demonstrate how the behavior of expressions may change if data_conversion_compatibility_level is changed.

Example 1

SET GLOBAL data_conversion_compatibility_level = '6.0';
SELECT DATE('2019-04-20') + 2;
****
+------------------------+
| DATE('2019-04-20') + 2 |
+------------------------+
|               20190422 |
+------------------------+

Example 2

SET GLOBAL data_conversion_compatibility_level = '7.0';
SELECT DATE('2019-04-20') + 2;
****
+------------------------+
| DATE('2019-04-20') + 2 |
+------------------------+
| 2019-04-22             |
+------------------------+