Table 12.9 String Regular Expression Operators
| Name | Description |
|---|---|
NOT REGEXP | Negation of REGEXP |
REGEXP | Pattern matching using regular expressions |
RLIKE | Synonym for REGEXP |
A regular expression is a powerful way of specifying a pattern for a complex search.
MySQL uses Henry Spencer's implementation of regular
expressions, which is aimed at conformance with POSIX 1003.2.
MySQL uses the extended version to support pattern-matching
operations performed with the
REGEXP operator in SQL statements.
This section summarizes, with examples, the special characters
and constructs that can be used in MySQL for
REGEXP operations. It does not
contain all the details that can be found in Henry Spencer's
regex(7) manual page. That manual page is
included in MySQL source distributions, in the
regex.7 file under the
regex directory. See also
Section 3.3.4.7, “Pattern Matching”.
Regular Expression Operators
,exprNOT REGEXPpatexprNOT RLIKEpatThis is the same as
NOT (.exprREGEXPpat),exprREGEXPpatexprRLIKEpatPerforms a pattern match of a string expression
expragainst a patternpat. The pattern can be an extended regular expression, the syntax for which is discussed later in this section. Returns1ifexprmatchespat; otherwise it returns0. If eitherexprorpatisNULL, the result isNULL.RLIKEis a synonym forREGEXP.The pattern need not be a literal string. For example, it can be specified as a string expression or table column.
NoteBecause MySQL uses the C escape syntax in strings (for example,
\nto represent the newline character), you must double any\that you use in yourREGEXPstrings.REGEXPis not case sensitive, except when used with binary strings.mysql> SELECT 'Michael!' REGEXP '.*'; -> 1 mysql> SELECT 'new*\n*line' REGEXP 'new\\*.\\*line'; -> 1 mysql> SELECT 'a' REGEXP 'A', 'a' REGEXP BINARY 'A'; -> 1 0 mysql> SELECT 'a' REGEXP '^[a-d]'; -> 1REGEXPandRLIKEuse the character set and collations of the arguments when deciding the type of a character and performing the comparison. If the arguments have different character sets or collations, coercibility rules apply as described in Section 10.1.8.4, “Collation Coercibility in Expressions”.WarningThe
REGEXPandRLIKEoperators work in byte-wise fashion, so they are not multibyte safe and may produce unexpected results with multibyte character sets. In addition, these operators compare characters by their byte values and accented characters may not compare as equal even if a given collation treats them as equal.
Syntax of Regular Expressions
A regular expression describes a set of strings. The simplest
regular expression is one that has no special characters in it.
For example, the regular expression hello
matches hello and nothing else.
Nontrivial regular expressions use certain special constructs so
that they can match more than one string. For example, the
regular expression hello|word matches either
the string hello or the string
word.
As a more complex example, the regular expression
B[an]*s matches any of the strings
Bananas, Baaaaas,
Bs, and any other string starting with a
B, ending with an s, and
containing any number of a or
n characters in between.
A regular expression for the REGEXP
operator may use any of the following special characters and
constructs:
^Match the beginning of a string.
mysql> SELECT 'fo\nfo' REGEXP '^fo$'; -> 0 mysql> SELECT 'fofo' REGEXP '^fo'; -> 1$Match the end of a string.
mysql> SELECT 'fo\no' REGEXP '^fo\no$'; -> 1 mysql> SELECT 'fo\no' REGEXP '^fo$'; -> 0.Match any character (including carriage return and newline).
mysql> SELECT 'fofo' REGEXP '^f.*$'; -> 1 mysql> SELECT 'fo\r\nfo' REGEXP '^f.*$'; -> 1a*Match any sequence of zero or more
acharacters.mysql> SELECT 'Ban' REGEXP '^Ba*n'; -> 1 mysql> SELECT 'Baaan' REGEXP '^Ba*n'; -> 1 mysql> SELECT 'Bn' REGEXP '^Ba*n'; -> 1a+Match any sequence of one or more
acharacters.mysql> SELECT 'Ban' REGEXP '^Ba+n'; -> 1 mysql> SELECT 'Bn' REGEXP '^Ba+n'; -> 0a?Match either zero or one
acharacter.mysql> SELECT 'Bn' REGEXP '^Ba?n'; -> 1 mysql> SELECT 'Ban' REGEXP '^Ba?n'; -> 1 mysql> SELECT 'Baan' REGEXP '^Ba?n'; -> 0de|abcMatch either of the sequences
deorabc.mysql> SELECT 'pi' REGEXP 'pi|apa'; -> 1 mysql> SELECT 'axe' REGEXP 'pi|apa'; -> 0 mysql> SELECT 'apa' REGEXP 'pi|apa'; -> 1 mysql> SELECT 'apa' REGEXP '^(pi|apa)$'; -> 1 mysql> SELECT 'pi' REGEXP '^(pi|apa)$'; -> 1 mysql> SELECT 'pix' REGEXP '^(pi|apa)$'; -> 0(abc)*Match zero or more instances of the sequence
abc.mysql> SELECT 'pi' REGEXP '^(pi)*$'; -> 1 mysql> SELECT 'pip' REGEXP '^(pi)*$'; -> 0 mysql> SELECT 'pipi' REGEXP '^(pi)*$'; -> 1{1},{2,3}{n}or{m,n}notation provides a more general way of writing regular expressions that match many occurrences of the previous atom (or “piece”) of the pattern.mandnare integers.a*Can be written as
a{0,}.a+Can be written as
a{1,}.a?Can be written as
a{0,1}.
To be more precise,
a{n}matches exactlyninstances ofa.a{n,}matchesnor more instances ofa.a{m,n}matchesmthroughninstances ofa, inclusive.mandnmust be in the range from0toRE_DUP_MAX(default 255), inclusive. If bothmandnare given,mmust be less than or equal ton.mysql> SELECT 'abcde' REGEXP 'a[bcd]{2}e'; -> 0 mysql> SELECT 'abcde' REGEXP 'a[bcd]{3}e'; -> 1 mysql> SELECT 'abcde' REGEXP 'a[bcd]{1,10}e'; -> 1[a-dX],[^a-dX]Matches any character that is (or is not, if ^ is used) either
a,b,c,dorX. A-character between two other characters forms a range that matches all characters from the first character to the second. For example,[0-9]matches any decimal digit. To include a literal]character, it must immediately follow the opening bracket[. To include a literal-character, it must be written first or last. Any character that does not have a defined special meaning inside a[]pair matches only itself.mysql> SELECT 'aXbc' REGEXP '[a-dXYZ]'; -> 1 mysql> SELECT 'aXbc' REGEXP '^[a-dXYZ]$'; -> 0 mysql> SELECT 'aXbc' REGEXP '^[a-dXYZ]+$'; -> 1 mysql> SELECT 'aXbc' REGEXP '^[^a-dXYZ]+$'; -> 0 mysql> SELECT 'gheis' REGEXP '^[^a-dXYZ]+$'; -> 1 mysql> SELECT 'gheisa' REGEXP '^[^a-dXYZ]+$'; -> 0[.characters.]Within a bracket expression (written using
[and]), matches the sequence of characters of that collating element.charactersis either a single character or a character name likenewline. The following table lists the permissible character names.The following table shows the permissible character names and the characters that they match. For characters given as numeric values, the values are represented in octal.
Name Character Name Character NUL0SOH001STX002ETX003EOT004ENQ005ACK006BEL007alert007BS010backspace'\b'HT011tab'\t'LF012newline'\n'VT013vertical-tab'\v'FF014form-feed'\f'CR015carriage-return'\r'SO016SI017DLE020DC1021DC2022DC3023DC4024NAK025SYN026ETB027CAN030EM031SUB032ESC033IS4034FS034IS3035GS035IS2036RS036IS1037US037space' 'exclamation-mark'!'quotation-mark'"'number-sign'#'dollar-sign'$'percent-sign'%'ampersand'&'apostrophe'\''left-parenthesis'('right-parenthesis')'asterisk'*'plus-sign'+'comma','hyphen'-'hyphen-minus'-'period'.'full-stop'.'slash'/'solidus'/'zero'0'one'1'two'2'three'3'four'4'five'5'six'6'seven'7'eight'8'nine'9'colon':'semicolon';'less-than-sign'<'equals-sign'='greater-than-sign'>'question-mark'?'commercial-at'@'left-square-bracket'['backslash'\\'reverse-solidus'\\'right-square-bracket']'circumflex'^'circumflex-accent'^'underscore'_'low-line'_'grave-accent'`'left-brace'{'left-curly-bracket'{'vertical-line'|'right-brace'}'right-curly-bracket'}'tilde'~'DEL177mysql> SELECT '~' REGEXP '[[.~.]]'; -> 1 mysql> SELECT '~' REGEXP '[[.tilde.]]'; -> 1[=character_class=]Within a bracket expression (written using
[and]),[=character_class=]represents an equivalence class. It matches all characters with the same collation value, including itself. For example, ifoand(+)are the members of an equivalence class,[[=o=]],[[=(+)=]], and[o(+)]are all synonymous. An equivalence class may not be used as an endpoint of a range.[:character_class:]Within a bracket expression (written using
[and]),[:character_class:]represents a character class that matches all characters belonging to that class. The following table lists the standard class names. These names stand for the character classes defined in thectype(3)manual page. A particular locale may provide other class names. A character class may not be used as an endpoint of a range.Character Class Name Meaning alnumAlphanumeric characters alphaAlphabetic characters blankWhitespace characters cntrlControl characters digitDigit characters graphGraphic characters lowerLowercase alphabetic characters printGraphic or space characters punctPunctuation characters spaceSpace, tab, newline, and carriage return upperUppercase alphabetic characters xdigitHexadecimal digit characters mysql> SELECT 'justalnums' REGEXP '[[:alnum:]]+'; -> 1 mysql> SELECT '!!' REGEXP '[[:alnum:]]+'; -> 0[[:<:]],[[:>:]]These markers stand for word boundaries. They match the beginning and end of words, respectively. A word is a sequence of word characters that is not preceded by or followed by word characters. A word character is an alphanumeric character in the
alnumclass or an underscore (_).mysql> SELECT 'a word a' REGEXP '[[:<:]]word[[:>:]]'; -> 1 mysql> SELECT 'a xword a' REGEXP '[[:<:]]word[[:>:]]'; -> 0
To use a literal instance of a special character in a regular
expression, precede it by two backslash (\) characters. The
MySQL parser interprets one of the backslashes, and the regular
expression library interprets the other. For example, to match
the string 1+2 that contains the special
+ character, only the last of the following
regular expressions is the correct one:
mysql> SELECT '1+2' REGEXP '1+2'; -> 0
mysql> SELECT '1+2' REGEXP '1\+2'; -> 0
mysql> SELECT '1+2' REGEXP '1\\+2'; -> 1
WHERE phone REGEXP '(435)';
would return any phone numbers that have the sequence 435 in any part of the string, such as "1(801)555-4351". However:
WHERE phone REGEXP '[(]435[)]';
would return only phone numbers with (435), such as "1(435)555-5555".
SELECT * FROM foo WHERE bar REGEXP "baz"
To match a "special" character such as $, you need to prefix it with the backslash \ character. So \$ matches an actual dollar sign. However, in almost any programming language that claims a "C-like" syntax, that backslash is likely to get picked up on as a special character. So you may need to use an extra backslash. Also, at least in Perl and PHP, the $ is a special character itself, because it indicates that what follows is a variable name - so it will need a backslash too.
In Perl or PHP, you probably will write something like this to match on a line starting with a $ sign:
$query = "SELECT * FROM `foo` WHERE `bar` REGEXP \"^\\\$\""
I'll explain the special characters in that and what they mean:
backslash, speech mark = a literal speech mark
HAT sign = beginning of line
two backslashes = a literal backslash
backslash, dollar = a literal dollar sign
backslash, speech mark = a literal speech mark
Now if you print $query, it will have the value
SELECT * FROM `foo` WHERE `bar` REGEXP "^\$"
which is what you really want, and how you would type it into the mysql command line. Remember also that PHPMyAdmin expects you to put a backslash before a backslash or apostrophe. So in PHPMyAdmin you would enter
SELECT * FROM `foo` WHERE `bar` REGEXP "^\\$"
I guess if you only want to use . and .* regular expressions, you may as well stick to using LIKE with the _ and % wildcards, as that is probably a bit faster. Finally, when using regular expressions in *any* language you need to watch out, because it is very easy to write ones that will always match, and almost as easy to write ones that will never match anything. So do check!
Also, don't get in the habit of using character classes to escape metacharacters (like using [(] to match a literal parenthese). The open parentheses '(' and ')' have no special meaning inside a character class, but others like $ retain their meaning. Using [$] won't match the character '$', it'll still match the beginning of the string.
So escape them properly with backslashes.
So what if your PHP code has something that looks like...
mysql_query("select * from a where Name rlike '(^|//////|)example'")
...just to match a string that starts with "example" or contains the string "|example". It looks terrible and it works.
SELECT * FROM site WHERE html REGEXP "id=[[.apostrophe.][.quotation-mark.]]archives[[.apostrophe.][.quotation-mark.]]";
Ta da. No escaping issues.
SELECT 'WORD' REGEXP '[[:upper:]]{4}'; # => 1;
SELECT 'WORD' REGEXP '[[:lower:]]{4}'; # => 0
work right only when collate is _cs and NOT _ci (case insensitive)
created tables eg. the collate 'latin1_swedish_ci' have to be changed, if you want to use case sensitive REGEXPs like [[:upper:]] or [[:lower:]]!
ALTER TABLE <name> CONVERT TO CHARACTER SET latin1 COLLATE latin_general_cs
I set in my.cnf now:
[mysqld]
default-collation= latin1_general_cs
#default was latin1_swedish_ci
Here's an extended example (based on the one previously posted):
SELECT
('WORD' REGEXP '[[:upper:]]{4}') AS `upper_match`, # this will be a 1
('WORD' REGEXP '[[:lower:]]{4}') AS `lower_match`, # this will be a 1 on an "*_ci" collation
# -BINARY- matches below
(BINARY 'WORD' REGEXP '[[:upper:]]{4}') AS `bin_upper_match`, # this will be a 1
(BINARY 'WORD' REGEXP '[[:lower:]]{4}') AS `bin_lower_match` # this will be a 0 even on an "*_ci" collation
To avoid this use one of the following Methods:
SELECT * FROM table WHERE text COLLATE latin1_general_cs REGEXP '...'
OR
SELECT * FROM table WHERE CAST(x AS BINARY) REGEXP '...'
This information can't be find on the dev.mysql.com site.
If you are coding in PHP and you need to match a literal backslash you can easily end up with an unmaintainable mess of \\\\\\\\'s.
Try to keep it clean applying the appropriate escaping functions:
There doesn't seem to be a native quote funcion in MySQL but PHP's preg_quote() apparently works.
regular expressions do not utilize indexes. the only way to improve their performance is to write a more efficient expression.
-place the fixed/bigger part first or "fail fast", for ex: instead of (e|abcd) use (abcd|e)
-extract similitudes, ex: instead of (abcd|abef) use ab(cd|ef)
-prefere LIKE with % and _ , or string functions if the match is simple
-try to create less cycles(regex use a lot backtracking)!!!
Hope it helps!
Here's my simple workaround, for a database, server, and current connection that are entirely in UTF-8. Of course this only helps the majority of us developers who are in fact dealing with stored data that could just as well have be expressed in latin1:
SELECT * FROM YourTables WHERE (CONVERT (TextField USING latin1)) REGEXP CONVERT ('YourUTF8RegExp' USING latin1))
While creating a search function with syntax-highlighting for content which include bulletin board codes ([code], [ul], [li], etc.) I was stuck for a few hours about how to exclude the contents of the BBC tags. For example, if someone search for "ode", the [code] tags won't needed in the result list, not to mention that the syntax highlighting destroyed the html code as well :)
I was only able to do it this way ($search is the search string, passed from php):
SELECT * FROM table
WHERE LOWER(content) LIKE '%$search%'
AND content REGEXP '[\]].[^\[]*$search'
I wanted my $search to be found only after "]", and only if there is no "[" between them. Other characters are allowed.
SELECT * FROM my_table
WHERE the_column REGEXP '[^[.NUL.]-[.DEL.]]'
Alternately, if you want to exclude control characters as well:
SELECT * FROM my_table
WHERE the_column REGEXP '[^ -~]'
The sheet is color coded and is very easy to read.
(?!^11$)^\d+$
But with their implementation, I have to make a "do not match" field as well as a "match" field to make sure my conditions are met. :/
^([0-9]|[02-9][02-9]|[0-9][02-9]|[02-9][0-9]|[0-9]{3,})$
:)
SELECT * FROM REG where seg_3 REGEXP 'SHA|S11|A11|SH4|SH4H|H4H|5HA|5HAH|HAH'
SNUSHAH
To get around this, I changed the string being searched for, in code, to: "[:space:]*L[:space:]*o[:space:]*u[:space:]*D[:space:]*o[:space:]*g[:space:]*" but this wasn't working.
Eventually someone spotted that I was essentially searching for zero-or-more instances of the characters :, s, p, a, c or e. This search should instead have been: "[[:space:]]*L[[:space:]]*o[[:space:]]*u[[:space:]]*D[[:space:]]*o[[:space:]]*g[[:space:]]*"
It's not clear in the examples above that this is what you should be searching for. This is roughly equivelent to "/\s+L\s+o\s+u\s+D\s+o\s+g\s+/i"
Using ORed decompositions is fine for my simple example, but in actual data, the length of the decompositions may be well beyond the point where they would be worth doing over just using two fields... or better yet if MySQL would implement the negation operator ?!.
Also, decompositions are much less clear in what they are doing. It's pretty obvious what the negation operator is doing. When I have to revisit a regex 2 years from now to add something to it, I'm going to want to smack someone that used a decomp.
return sValue regexp '^-?[0-9]+[.]?[0-9]*$|^-?[.][0-9]+$';
or
return sValue regexp '^-?[0-9]+$';
where sValue is a char argument.
1) A pretty neutral cheat sheet for regexes.
The download is for free.
http://www.cheatography.com/davechild/cheat-sheets/regular-expressions/
and there is a nicer-looking one for a symbolic fee, here:
http://www.addedbytes.com/cheat-sheets/regular-expressions-cheat-sheet/
2) While we are at it, here's a MySQL cheat sheet by the same author:
http://www.addedbytes.com/download/mysql-cheat-sheet-v1/pdf/
3) Another regex cheat sheet:
http://www.regular-expressions.info/reference.html
Hope this is helpful.
<-- minimum salaries expected rounded to the nearest whole number, and the minimum of the minimum salaries expected rounded to the nearest whole number.-->
SELECT SUBSTR(qtrCode, 1, 4) AS Year,
ROUND(AVG(minsal)) "Average Minimum Salary",
ROUND(MIN(minsal)) "Minimum Offered"
FROM quarter
GROUP BY Year;
<-- state descriptions that contain “or” (MUST use the instr function to do this) and the count of the locations-->
<-- desired (from the quarter table) in that state. Be sure to list ALL states.-->
SELECT Description, COUNT(location)as 'Number of locations'
FROM state LEFT JOIN quarter ON state.statecode=quarter.location
WHERE INSTR(description,'or')>0
GROUP BY description;
<--For each quarter, list the full statename, the company name and the division for interviews in locations that match -->
SELECT q.qtrcode "QTR", s.description "State",e.companyname "Company",
e.division "Division", i.interviewdate "Date"
FROM quarter q, state s, employer e, interview i
WHERE e.companyname=i.companyname AND
e.division=i.division AND
q.qtrcode=i.qtrcode AND
e.statecode=q.location AND
e.statecode=s.statecode;