Regular Expression Commands

Suppose you want to find the starting position of a string within another string. Call the former the search string and the latter the target string. To find dog within The dog chased the cat, you run the query SELECT INSTR('The dog chased the cat','dog');. This query returns 5, indicating the position where it found dog.

But INSTR does not allow search strings that contain placeholders. The following examples show search strings containing placeholders, denoted by the ? symbol.

Search String Placeholder
The dog chased the ? Any string
The ? chased the cat Any three letter string
The dog ? the cat The string chased or found

Regular expressions enable you to define search strings with placeholders. You specify these strings using patterns. For instance, the pattern The .{3,5} chased the mouse|frog matches many target strings. Two of them are The dog chased the frog and The rat chased the mouse.

Patterns contain literals and metacharacters. Literals are characters that a pattern matcher attempts to locate in the target string exactly as the characters are written in the search string. Metacharacters describe the placeholders that the pattern uses.

The pattern The .{3,5} chased the mouse|frog contains the literals The,chased,the,mouse and frog. It contains the metacharacters .,{3,5} and |.

Regular Expression Formats

MemSQL supports extended regular expressions (ERE) as defined in the POSIX standard and advanced regular expressions (ARE) as defined in PostgreSQL. ARE are nearly a superset of ERE and support common Perl regular expression extensions.

You can use both formats with the built-in MemSQL functions RLIKE, REGEXP,REGEXP_INSTR and REGEXP_REPLACE, which are described in the next section.

Set the global variable regexp_format to specify the regular expression format to be used by the built-in functions you call.

To use ERE globally, run:

SET GLOBAL regexp_format = 'extended';

To use ARE globally, run:

SET GLOBAL regexp_format = 'advanced';

By default, regexp_format is set to 'extended'.

Alert

You can only set regexp_format globally; all nodes in the cluster will use the global setting.

Alert

You should set regexp_format to 'advanced' if you are migrating regular expression logic from another database system and this logic uses advanced regular expressions or common Perl regular expression extensions.

You should set regexp_format to 'advanced' if you are developing new regular expression logic.

Regular Expression Functions

MemSQL has four built-in functions that allow you to work with regular expressions.

  • REGEXP: Searches a string for a regular expression pattern and returns 1 if the string is found, 0 otherwise.

  • RLIKE: Has the same functionality as REGEXP.

  • REGEXP_INSTR: Searches a string for a regular expression pattern and returns an integer indicating the beginning position or ending position of the matched substring.

  • REGEXP_REPLACE: Searches a string for the first occurrence of a regular expression pattern and replaces that occurrence with a replacement string. Can also perform a search and replace of all occurrences.

Extended Regular Expression (ERE) Support

MemSQL supports ERE as defined in the POSIX standard. This support is summarized below.

Common metacharacters used in ERE are listed in the following table.

Metacharacter Meaning Examples
. Matches any single character. a.bc matches aabc, abbc, apbc, azbc, a8bc, a)bc, etc.
? Matches the previous element zero or one time. xy?z matches xz, xyz
* Matches the previous element zero or more times. xy*z matches xz, xyz, xyyz, xyyyz etc.
+ Matches the previous element one or more times. xy+z matches xyz, xyyz, xyyyz etc.
^ Matches the beginning of a string. ^hi matches hibye but not ibye
$ Matches the end of a string. ye$ matches hibye but not y
abc|def Matches the characters to the left or right of |. abc|def matches abc, def
[ab] Matches either a or b. [xyz] matches x, y, or z, but not e
^[ab] Matches any single character except a or b. ^[xyz] matches p or q, but not x
{n} Matches the previous element n times. ab{3}z matches abbbz
{m,} Matches the previous element m or more times. ab{3,}z matches abbbbbz
{n,m} Matches the previous element n to m times, where n < m. ab{1,3}z matches abz, abbz, abbbz
[:range:] Matches a range of characters, also called a character class. [:digit:] matches 5. [:alpha:] matches b

Patterns may contain multiple metacharacters, as shown in the following examples.

Pattern Example Matches
a.b.c aybzc, a4b6c
ab?c?d. abcdt, ads, abdr

Metacharacters often match elements. An element is a single character or a group of characters enclosed in parentheses. A group is matched as a unit. Groups can be nested. The following examples show patterns that use groups.

Pattern Example Matches
a([:digit:]ef){2}g a3ef5efg,a1ef8efg
a(bcd)|(efg)h abcdh, aefgh
z(ab)*(cd)+ zcd, zabcd, zababcd
z((ab)|(cd){2}e){2}y zababy, zabcdcdey, zcdcdecdcdey
Info

The | metacharacter can match a series of characters not enclosed in the parentheses. For example, the pattern abc|def matches abc or def. But | has the lowest precedence of all metacharacters. For more complicated patterns involving |, you will need to group the characters you want to match. For example, to match abc or two occurences of def, use the pattern abc|(def){2}.

Advanced Regular Expression (ARE) Support

MemSQL supports ARE as defined in PostgreSQL. This support is summarized below.

Backreferences

A pattern containing a backreference matches the same string that it matched earlier in the string. The earlier string must be enclosed in parentheses. A pattern may contain multiple backreferences that are denoted by \\n, where n is the number of the backreference.

For example, the pattern a(bc)d(ef)g\\1\\2 matches abcdefgbcef.

Escapes

Escapes make it easier to write some types of patterns. Common escapes are listed below. See section 9.7.3.3 of the PostgreSQL pattern matching documentation for a full explanation of the supported escapes.

Metacharacter Meaning
\d Matches any digit.
\s Matches a space.
\w Matches any alpha-numeric character including the underscore.
\D Matches any character except a digit.
\S Matches any character except a space.
\W Matches any non-alpha-numeric character. Does not match the underscore.
\m Matches the beginning of a word.
\M Matches the end of a word.
Info

In PostgreSQL, escapes in patterns begin with one \ character.

In MemSQL, escapes in patterns also begin with one \ character. Additionally, a \ literal in an MemSQL string needs to escaped with a leading \. Hence, pattern strings containing escapes use two \ characters. For example, SELECT REGEXP_INSTR('dog58','\\d');

Was this article useful?