FULLTEXT indexes are created on text-based
TEXT columns) to help speed up
queries and DML operations on data contained within those
columns, omitting any words that are defined as stopwords.
have an inverted index design. Inverted indexes store a list of
words, and for each word, a list of documents that the word
appears in. To support proximity search, position information
for each word is also stored, as a byte offset.
index, a set of index tables is created, as shown in the
CREATE TABLE opening_lines ( id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY, opening_line TEXT(500), author VARCHAR(200), title VARCHAR(200), FULLTEXT idx (opening_line) ) ENGINE=InnoDB; mysql> SELECT table_id, name, space from INFORMATION_SCHEMA.INNODB_SYS_TABLES WHERE name LIKE 'test/%'; +----------+----------------------------------------------------+-------+ | table_id | name | space | +----------+----------------------------------------------------+-------+ | 333 | test/FTS_0000000000000147_00000000000001c9_INDEX_1 | 289 | | 334 | test/FTS_0000000000000147_00000000000001c9_INDEX_2 | 290 | | 335 | test/FTS_0000000000000147_00000000000001c9_INDEX_3 | 291 | | 336 | test/FTS_0000000000000147_00000000000001c9_INDEX_4 | 292 | | 337 | test/FTS_0000000000000147_00000000000001c9_INDEX_5 | 293 | | 338 | test/FTS_0000000000000147_00000000000001c9_INDEX_6 | 294 | | 330 | test/FTS_0000000000000147_BEING_DELETED | 286 | | 331 | test/FTS_0000000000000147_BEING_DELETED_CACHE | 287 | | 332 | test/FTS_0000000000000147_CONFIG | 288 | | 328 | test/FTS_0000000000000147_DELETED | 284 | | 329 | test/FTS_0000000000000147_DELETED_CACHE | 285 | | 327 | test/opening_lines | 283 | +----------+----------------------------------------------------+-------+ 12 rows in set (0.02 sec)
The first six tables represent the inverted index and are
referred to as auxiliary index tables. When incoming documents
are tokenized, the individual words (also referred to as
“tokens”) are inserted into the index tables along
with position information and the associated Document ID
DOC_ID). The words are fully sorted and
partitioned among the six index tables based on the character
set sort weight of the word's first character.
The inverted index is partitioned into six auxiliary index
tables to support parallel index creation. By default, two
threads tokenize, sort, and insert words and associated data
into the index tables. The number of threads is configurable
option. When creating
FULLTEXT indexes on
large tables, consider increasing the number of threads.
Auxiliary index table names are prefixed with
FTS_ and postfixed with
INDEX_*. Each index table is associated with
the indexed table by a hex value in the index table name that
table_id of the indexed table.
For example, the
table_id of the
test/opening_lines table is
327, for which the hex value is 0x147. As
shown in the preceding example, the “147” hex value
appears in the names of index tables that are associated with
A hex value representing the
index_id of the
FULLTEXT index also appears in auxiliary
index table names. For example, in the auxiliary table name
the hex value
1c9 has a decimal value of 457.
The index defined on the
idx) can be identified by querying the
table for this value (457).
mysql> SELECT index_id, name, table_id, space from INFORMATION_SCHEMA.INNODB_SYS_INDEXES WHERE index_id=457; +----------+------+----------+-------+ | index_id | name | table_id | space | +----------+------+----------+-------+ | 457 | idx | 327 | 283 | +----------+------+----------+-------+ 1 row in set (0.00 sec)
Index tables are stored in their own tablespace if the primary table is created in a file-per-table tablespace.
Due to a bug introduced in MySQL 5.6.5, index tables were
created in the
InnoDB system tablespace
(space 0) even though the primary table was created in a
file-per-table tablespace. The bug is fixed in MySQL 5.6.20
and MySQL 5.7.5 (Bug#18635485). As of MySQL 5.7.8, auxiliary
index tables are always stored in the same tablespace as the
primary table and have the same row format as the primary
table (MySQL Bug #75869).
The other index tables shown in the preceding example are used
for deletion handling and for storing the internal state of the
FTS_*_DELETED_CACHE: Contain the document
IDs (DOC_ID) for documents that are deleted but whose data
is not yet removed from the full-text index. The
FTS_*_DELETED_CACHE is the in-memory
version of the
FTS_*_BEING_DELETED_CACHE: Contain the
document IDs (DOC_ID) for documents that are deleted and
whose data is currently in the process of being removed from
the full-text index. The
FTS_*_BEING_DELETED_CACHE table is the
in-memory version of the
FTS_*_CONFIG: Stores information about
the internal state of the
Most importantly, it stores the
FTS_SYNCED_DOC_ID, which identifies
documents that have been parsed and flushed to disk. In case
of crash recovery,
values are used to identify documents that have not been
flushed to disk so that the documents can be re-parsed and
added back to the
FULLTEXT index cache.
To view the data in this table, query the
When a document is inserted, it is tokenized, and the individual
words and associated data are inserted into the
FULLTEXT index. This process, even for small
documents, could result in numerous small insertions into the
auxiliary index tables, making concurrent access to these tables
a point of contention. To avoid this problem,
InnoDB uses a
index cache to temporarily cache index table insertions for
recently inserted rows. This in-memory cache structure holds
insertions until the cache is full and then batch flushes them
to disk (to the auxiliary index tables). You can query the
table to view tokenized data for recently inserted rows.
The caching and batch flushing behavior avoids frequent updates to auxiliary index tables, which could result in concurrent access issues during busy insert and update times. The batching technique also avoids multiple insertions for the same word, and minimizes duplicate entries. Instead of flushing each word individually, insertions for the same word are merged and flushed to disk as a single entry, improving insertion efficiency while keeping auxiliary index tables as small as possible.
variable is used to configure the full-text index cache size (on
a per-table basis), which affects how often the full-text index
cache is flushed. You can also define a global full-text index
cache size limit for all tables in a given instance using the
The full-text index cache stores the same information as auxiliary index tables. However, the full-text index cache only caches tokenized data for recently inserted rows. The data that is already flushed to disk (to the full-text auxiliary tables) is not brought back into the full-text index cache when queried. The data in auxiliary index tables is queried directly, and results from the auxiliary index tables are merged with results from the full-text index cache before being returned.
InnoDB uses a unique document identifier
referred to as a Document ID (
DOC_ID) to map
words in the full-text index to document records where the word
appears. The mapping requires an
column on the indexed table. If an
column is not defined,
adds a hidden
FTS_DOC_ID column when the
full-text index is created. The following example demonstrates
The following table definition does not include an
CREATE TABLE opening_lines ( id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY, opening_line TEXT(500), author VARCHAR(200), title VARCHAR(200) ) ENGINE=InnoDB;
When you create a full-text index on the table using
CREATE FULLTEXT INDEX syntax, a warning is
returned which reports that
rebuilding the table to add the
mysql> CREATE FULLTEXT INDEX idx ON opening_lines(opening_line); Query OK, 0 rows affected, 1 warning (0.19 sec) Records: 0 Duplicates: 0 Warnings: 1 mysql> SHOW WARNINGS; +---------+------+--------------------------------------------------+ | Level | Code | Message | +---------+------+--------------------------------------------------+ | Warning | 124 | InnoDB rebuilding table to add column FTS_DOC_ID | +---------+------+--------------------------------------------------+ 1 row in set (0.00 sec)
The same warning is returned when using
ALTER TABLE to add a full-text
index to a table that does not have an
FTS_DOC_ID column. If you create a full-text
CREATE TABLE time and do
not specify an
InnoDB adds a hidden
FTS_DOC_ID column, without warning.
FTS_DOC_ID column at
CREATE TABLE time reduces the
time required to create a full-text index on a table that is
already loaded with data. If an
column is defined on a table prior to loading data, the table
and its indexes do not have to be rebuilt to add the new column.
If you are not concerned with
INDEX performance, leave out the
FTS_DOC_ID column to have
InnoDB create it for you.
InnoDB creates a hidden
FTS_DOC_ID column along with a unique index
FTS_DOC_ID_INDEX) on the
FTS_DOC_ID column. If you want to create your
FTS_DOC_ID column, the column must be
BIGINT UNSIGNED NOT NULL and named
FTS_DOC_ID (all upper case), as in the
FTS_DOC_ID column does not need to be
defined as an
AUTO_INCREMENT column but
AUTO_INCREMENT could make loading data
CREATE TABLE opening_lines ( FTS_DOC_ID BIGINT UNSIGNED AUTO_INCREMENT NOT NULL, opening_line TEXT(500), author VARCHAR(200), title VARCHAR(200) ) ENGINE=InnoDB;
If you choose to define the
yourself, you are responsible for managing the column to avoid
empty or duplicate values.
cannot be reused, which means
values must be ever increasing.
Optionally, you can create the required unique
FTS_DOC_ID_INDEX (all upper case) on the
CREATE UNIQUE INDEX FTS_DOC_ID_INDEX on opening_lines(FTS_DOC_ID);
If you do not create the
InnoDB creates it automatically.
Deleting a record that has a full-text index column could result
in numerous small deletions in the auxiliary index tables,
making concurrent access to these tables a point of contention.
To avoid this problem, the Document ID
DOC_ID) of a deleted document is logged in a
FTS_*_DELETED table whenever a record
is deleted from an indexed table, and the indexed record remains
in the full-text index. Before returning query results,
information in the
FTS_*_DELETED table is
used to filter out deleted Document IDs. The benefit of this
design is that deletions are fast and inexpensive. The drawback
is that the size of the index is not immediately reduced after
deleting records. To remove full-text index entries for deleted
records, you must run
OPTIMIZE TABLE on the
indexed table with
to rebuild the full-text index. For more information, see
Optimizing InnoDB Full-Text Indexes.
have special transaction handling characteristics due its
caching and batch processing behavior. Specifically, updates and
insertions on a
FULLTEXT index are processed
at transaction commit time, which means that a
FULLTEXT search can only see committed data.
The following example demonstrates this behavior. The
FULLTEXT search only returns a result after
the inserted lines are committed.
mysql> CREATE TABLE opening_lines ( id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY, opening_line TEXT(500), author VARCHAR(200), title VARCHAR(200), FULLTEXT idx (opening_line) ) ENGINE=InnoDB; mysql> BEGIN; Query OK, 0 rows affected (0.00 sec) mysql> INSERT INTO opening_lines(opening_line,author,title) VALUES ('Call me Ishmael.','Herman Melville','Moby-Dick'), ('A screaming comes across the sky.','Thomas Pynchon','Gravity\'s Rainbow'), ('I am an invisible man.','Ralph Ellison','Invisible Man'), ('Where now? Who now? When now?','Samuel Beckett','The Unnamable'), ('It was love at first sight.','Joseph Heller','Catch-22'), ('All this happened, more or less.','Kurt Vonnegut','Slaughterhouse-Five'), ('Mrs. Dalloway said she would buy the flowers herself.','Virginia Woolf','Mrs. Dalloway'), ('It was a pleasure to burn.','Ray Bradbury','Fahrenheit 451'); Query OK, 8 rows affected (0.00 sec) Records: 8 Duplicates: 0 Warnings: 0 mysql> SELECT COUNT(*) FROM opening_lines WHERE MATCH(opening_line) AGAINST('Ishmael'); +----------+ | COUNT(*) | +----------+ | 0 | +----------+ 1 row in set (0.00 sec) mysql> COMMIT; Query OK, 0 rows affected (0.00 sec) mysql> SELECT COUNT(*) FROM opening_lines WHERE MATCH(opening_line) AGAINST('Ishmael'); +----------+ | COUNT(*) | +----------+ | 1 | +----------+ 1 row in set (0.00 sec)
You can monitor and examine the special text-processing aspects
by querying the following
See Section 14.13.4, “InnoDB INFORMATION_SCHEMA FULLTEXT Index Tables” for more information.