MySQL has a built-in parser that it uses by default for full-text operations (parsing text to be indexed, or parsing a query string to determine the terms to be used for a search). As of MySQL 5.7.6, MySQL also provides an n-gram full-text parser plugin for Chinese, Japanese, and Korean (CJK), and a MeCab full-text parser plugin for Japanese. For full-text processing, “parsing” means extracting words from text or a query string based on rules that define which character sequences make up a word and where word boundaries lie.
When parsing for indexing purposes, the parser passes each word (or token in the case of an n-gram character-based parser) to the server, which adds it to a full-text index. When parsing a query string, the parser passes each word/token to the server, which accumulates the words/tokens for use in a search.
The parsing properties of the built-in full-text parser are
described in Section 12.9, “Full-Text Search Functions”. These
properties include rules for determining how to extract words
from text. The parser is influenced by certain system
variables such as
MyISAM that cause words shorter or longer
to be excluded, and by the stopword list that identifies
common words to be ignored.
The plugin API enables you to use a full-text parser other than the default built-in full-text parser. The plugin API also enables you to provide a full-text parser of your own so that you have control over the basic duties of a parser. A parser plugin can operate in either of two roles:
The plugin can replace the built-in parser. In this role, the plugin reads the input to be parsed, splits it up into words/tokens, and passes the words/tokens to the server (either for indexing or for word/token accumulation).
One reason to use a parser this way is that you need to use different rules from those of the built-in parser for determining how to split up input into words. For example, the built-in parser considers the text “case-sensitive” to consist of two words “case” and “sensitive,” whereas an application might need to treat the text as a single word. You could also be working with languages such as Chinese and Japanese that do not have word delimiters. The built-in full-text parser cannot determine where words begin and end in these and other such languages. Parser plugins such as the n-gram parser plugin and MeCab parser plugin (introduced in MySQL 5.7.6) may be better options in this case.
The plugin can act in conjunction with the built-in parser
by serving as a front end for it. In this role, the plugin
extracts text from the input and passes the text to the
parser, which splits up the text into words using its
normal parsing rules. In particular, this parsing will be
affected by the
system variables and the stopword list.
One reason to use a parser this way is that you need to
index content such as PDF documents, XML documents, or
.doc files. The built-in parser is
not intended for those types of input but a plugin can
pull out the text from these input sources and pass it to
the built-in parser.
It is also possible for a parser plugin to operate in both roles. That is, it could extract text from nonplaintext input (the front end role), and also parse the text into words (thus replacing the built-in parser).
A full-text plugin is associated with full-text indexes on a
per-index basis. That is, when you install a parser plugin
initially, that does not cause it to be used for any full-text
operations. It simply becomes available. For example, a
full-text parser plugin becomes available to be named in a
WITH PARSER clause when creating individual
FULLTEXT indexes. To create such an index
at table-creation time, do this:
CREATE TABLE t ( doc CHAR(255), FULLTEXT INDEX (doc) WITH PARSER my_parser ) ENGINE=InnoDB;
Or you can add the index after the table has been created:
ALTER TABLE t ADD FULLTEXT INDEX (doc) WITH PARSER my_parser;
The only SQL change for associating the parser with the index
WITH PARSER clause. Searches are
specified as before, with no changes needed for queries.
When you associate a parser plugin with a
FULLTEXT index, the plugin is required for
using the index. If the parser plugin is dropped, any index
associated with it becomes unusable. Any attempt to use a
table for which a plugin is not available results in an error,
DROP TABLE is still
For more information about full-text plugins, see
Section 188.8.131.52, “Writing Full-Text Parser Plugins”. MySQL 5.7
supports full-text plugins with
support for full-text plugins was added in MySQL 5.7.3.