Table 12.14 Regular Expression Functions and Operators
Name | Description |
---|---|
NOT REGEXP |
Negation of REGEXP |
REGEXP |
Whether string matches regular expression |
RLIKE |
Whether string matches regular expression |
A regular expression is a powerful way of specifying a pattern for a complex search. This section discusses the operators available for regular expression matching and illustrates, with examples, some of the special characters and constructs that can be used for regular expression operations. See also Section 3.3.4.7, “Pattern Matching”.
MySQL uses Henry Spencer's implementation of regular
expressions, which is aimed at conformance with POSIX 1003.2.
MySQL uses the extended version to support regular expression
pattern-matching operations in SQL statements. This section does
not contain all the details that can be found in Henry Spencer's
regex(7)
manual page. That manual page is
included in MySQL source distributions, in the
regex.7
file under the
regex
directory.
,expr
NOT REGEXPpat
expr
NOT RLIKEpat
This is the same as
NOT (
.expr
REGEXPpat
)
,expr
REGEXPpat
expr
RLIKEpat
Returns 1 if the string
expr
matches the regular expression specified by the patternpat
, 0 otherwise. If eitherexpr
orpat
isNULL
, the return value isNULL
.RLIKE
is a synonym forREGEXP
.The pattern can be an extended regular expression, the syntax for which is discussed in Regular Expression Syntax. The pattern need not be a literal string. For example, it can be specified as a string expression or table column.
NoteMySQL uses C escape syntax in strings (for example,
\n
to represent the newline character). If you want yourexpr
orpat
argument to contain a literal\
, you must double it. (Unless theNO_BACKSLASH_ESCAPES
SQL mode is enabled, in which case no escape character is used.)Regular expression operations use the character set and collation of the string expression and pattern arguments when deciding the type of a character and performing the comparison. If the arguments have different character sets or collations, coercibility rules apply as described in Section 10.8.4, “Collation Coercibility in Expressions”. If either argument is a binary string, the arguments are handled in case-sensitive fashion as binary strings.
mysql> SELECT 'Michael!' REGEXP '.*'; +------------------------+ | 'Michael!' REGEXP '.*' | +------------------------+ | 1 | +------------------------+ mysql> SELECT 'new*\n*line' REGEXP 'new\\*.\\*line'; +---------------------------------------+ | 'new*\n*line' REGEXP 'new\\*.\\*line' | +---------------------------------------+ | 0 | +---------------------------------------+ mysql> SELECT 'a' REGEXP '^[a-d]'; +---------------------+ | 'a' REGEXP '^[a-d]' | +---------------------+ | 1 | +---------------------+
WarningThe
REGEXP
andRLIKE
operators work in byte-wise fashion, so they are not multibyte safe and may produce unexpected results with multibyte character sets. In addition, these operators compare characters by their byte values and accented characters may not compare as equal even if a given collation treats them as equal.
A regular expression describes a set of strings. The simplest
regular expression is one that has no special characters in
it. For example, the regular expression
hello
matches hello
and
nothing else.
Nontrivial regular expressions use certain special constructs
so that they can match more than one string. For example, the
regular expression hello|world
contains the
|
alternation operator and matches either
the hello
or world
.
As a more complex example, the regular expression
B[an]*s
matches any of the strings
Bananas
, Baaaaas
,
Bs
, and any other string starting with a
B
, ending with an s
, and
containing any number of a
or
n
characters in between.
A regular expression for the
REGEXP
operator may use any of
the following special characters and constructs:
^
Match the beginning of a string.
mysql> SELECT 'fo\nfo' REGEXP '^fo$'; -> 0 mysql> SELECT 'fofo' REGEXP '^fo'; -> 1
$
Match the end of a string.
mysql> SELECT 'fo\no' REGEXP '^fo\no$'; -> 1 mysql> SELECT 'fo\no' REGEXP '^fo$'; -> 0
.
Match any character (including carriage return and newline).
mysql> SELECT 'fofo' REGEXP '^f.*$'; -> 1 mysql> SELECT 'fo\r\nfo' REGEXP '^f.*$'; -> 1
a*
Match any sequence of zero or more
a
characters.mysql> SELECT 'Ban' REGEXP '^Ba*n'; -> 1 mysql> SELECT 'Baaan' REGEXP '^Ba*n'; -> 1 mysql> SELECT 'Bn' REGEXP '^Ba*n'; -> 1
a+
Match any sequence of one or more
a
characters.mysql> SELECT 'Ban' REGEXP '^Ba+n'; -> 1 mysql> SELECT 'Bn' REGEXP '^Ba+n'; -> 0
a?
Match either zero or one
a
character.mysql> SELECT 'Bn' REGEXP '^Ba?n'; -> 1 mysql> SELECT 'Ban' REGEXP '^Ba?n'; -> 1 mysql> SELECT 'Baan' REGEXP '^Ba?n'; -> 0
de|abc
Alternation; match either of the sequences
de
orabc
.mysql> SELECT 'pi' REGEXP 'pi|apa'; -> 1 mysql> SELECT 'axe' REGEXP 'pi|apa'; -> 0 mysql> SELECT 'apa' REGEXP 'pi|apa'; -> 1 mysql> SELECT 'apa' REGEXP '^(pi|apa)$'; -> 1 mysql> SELECT 'pi' REGEXP '^(pi|apa)$'; -> 1 mysql> SELECT 'pix' REGEXP '^(pi|apa)$'; -> 0
(abc)*
Match zero or more instances of the sequence
abc
.mysql> SELECT 'pi' REGEXP '^(pi)*$'; -> 1 mysql> SELECT 'pip' REGEXP '^(pi)*$'; -> 0 mysql> SELECT 'pipi' REGEXP '^(pi)*$'; -> 1
{1}
,{2,3}
Repetition;
{
andn
}{
notation provide a more general way of writing regular expressions that match many occurrences of the previous atom (or “piece”) of the pattern.m
,n
}m
andn
are integers.a*
Can be written as
a{0,}
.a+
Can be written as
a{1,}
.a?
Can be written as
a{0,1}
.
To be more precise,
a{
matches exactlyn
}n
instances ofa
.a{
matchesn
,}n
or more instances ofa
.a{
matchesm
,n
}m
throughn
instances ofa
, inclusive. If bothm
andn
are given,m
must be less than or equal ton
.m
andn
must be in the range from0
toRE_DUP_MAX
(default 255), inclusive.mysql> SELECT 'abcde' REGEXP 'a[bcd]{2}e'; -> 0 mysql> SELECT 'abcde' REGEXP 'a[bcd]{3}e'; -> 1 mysql> SELECT 'abcde' REGEXP 'a[bcd]{1,10}e'; -> 1
[a-dX]
,[^a-dX]
Matches any character that is (or is not, if
^
is used) eithera
,b
,c
,d
orX
. A-
character between two other characters forms a range that matches all characters from the first character to the second. For example,[0-9]
matches any decimal digit. To include a literal]
character, it must immediately follow the opening bracket[
. To include a literal-
character, it must be written first or last. Any character that does not have a defined special meaning inside a[]
pair matches only itself.mysql> SELECT 'aXbc' REGEXP '[a-dXYZ]'; -> 1 mysql> SELECT 'aXbc' REGEXP '^[a-dXYZ]$'; -> 0 mysql> SELECT 'aXbc' REGEXP '^[a-dXYZ]+$'; -> 1 mysql> SELECT 'aXbc' REGEXP '^[^a-dXYZ]+$'; -> 0 mysql> SELECT 'gheis' REGEXP '^[^a-dXYZ]+$'; -> 1 mysql> SELECT 'gheisa' REGEXP '^[^a-dXYZ]+$'; -> 0
[.characters.]
Within a bracket expression (written using
[
and]
), matches the sequence of characters of that collating element.characters
is either a single character or a character name likenewline
. The following table lists the permissible character names.The following table shows the permissible character names and the characters that they match. For characters given as numeric values, the values are represented in octal.
Name Character Name Character NUL
0
SOH
001
STX
002
ETX
003
EOT
004
ENQ
005
ACK
006
BEL
007
alert
007
BS
010
backspace
'\b'
HT
011
tab
'\t'
LF
012
newline
'\n'
VT
013
vertical-tab
'\v'
FF
014
form-feed
'\f'
CR
015
carriage-return
'\r'
SO
016
SI
017
DLE
020
DC1
021
DC2
022
DC3
023
DC4
024
NAK
025
SYN
026
ETB
027
CAN
030
EM
031
SUB
032
ESC
033
IS4
034
FS
034
IS3
035
GS
035
IS2
036
RS
036
IS1
037
US
037
space
' '
exclamation-mark
'!'
quotation-mark
'"'
number-sign
'#'
dollar-sign
'$'
percent-sign
'%'
ampersand
'&'
apostrophe
'\''
left-parenthesis
'('
right-parenthesis
')'
asterisk
'*'
plus-sign
'+'
comma
','
hyphen
'-'
hyphen-minus
'-'
period
'.'
full-stop
'.'
slash
'/'
solidus
'/'
zero
'0'
one
'1'
two
'2'
three
'3'
four
'4'
five
'5'
six
'6'
seven
'7'
eight
'8'
nine
'9'
colon
':'
semicolon
';'
less-than-sign
'<'
equals-sign
'='
greater-than-sign
'>'
question-mark
'?'
commercial-at
'@'
left-square-bracket
'['
backslash
'\\'
reverse-solidus
'\\'
right-square-bracket
']'
circumflex
'^'
circumflex-accent
'^'
underscore
'_'
low-line
'_'
grave-accent
'`'
left-brace
'{'
left-curly-bracket
'{'
vertical-line
'|'
right-brace
'}'
right-curly-bracket
'}'
tilde
'~'
DEL
177
mysql> SELECT '~' REGEXP '[[.~.]]'; -> 1 mysql> SELECT '~' REGEXP '[[.tilde.]]'; -> 1
[=character_class=]
Within a bracket expression (written using
[
and]
),[=character_class=]
represents an equivalence class. It matches all characters with the same collation value, including itself. For example, ifo
and(+)
are the members of an equivalence class,[[=o=]]
,[[=(+)=]]
, and[o(+)]
are all synonymous. An equivalence class may not be used as an endpoint of a range.[:character_class:]
Within a bracket expression (written using
[
and]
),[:character_class:]
represents a character class that matches all characters belonging to that class. The following table lists the standard class names. These names stand for the character classes defined in thectype(3)
manual page. A particular locale may provide other class names. A character class may not be used as an endpoint of a range.Character Class Name Meaning alnum
Alphanumeric characters alpha
Alphabetic characters blank
Whitespace characters cntrl
Control characters digit
Digit characters graph
Graphic characters lower
Lowercase alphabetic characters print
Graphic or space characters punct
Punctuation characters space
Space, tab, newline, and carriage return upper
Uppercase alphabetic characters xdigit
Hexadecimal digit characters mysql> SELECT 'justalnums' REGEXP '[[:alnum:]]+'; -> 1 mysql> SELECT '!!' REGEXP '[[:alnum:]]+'; -> 0
[[:<:]]
,[[:>:]]
These markers stand for word boundaries. They match the beginning and end of words, respectively. A word is a sequence of word characters that is not preceded by or followed by word characters. A word character is an alphanumeric character in the
alnum
class or an underscore (_
).mysql> SELECT 'a word a' REGEXP '[[:<:]]word[[:>:]]'; -> 1 mysql> SELECT 'a xword a' REGEXP '[[:<:]]word[[:>:]]'; -> 0
To use a literal instance of a special character in a regular
expression, precede it by two backslash (\) characters. The
MySQL parser interprets one of the backslashes, and the
regular expression library interprets the other. For example,
to match the string 1+2
that contains the
special +
character, only the last of the
following regular expressions is the correct one:
mysql> SELECT '1+2' REGEXP '1+2'; -> 0
mysql> SELECT '1+2' REGEXP '1\+2'; -> 0
mysql> SELECT '1+2' REGEXP '1\\+2'; -> 1