The stopword list is loaded and searched for full-text queries
using the server character set and collation (the values of the
character_set_server
and
collation_server
system
variables). False hits or misses might occur for stopword
lookups if the stopword file or columns used for full-text
indexing or searches have a character set or collation different
from character_set_server
or
collation_server
.
Case sensitivity of stopword lookups depends on the server
collation. For example, lookups are case-insensitive if the
collation is utf8mb4_0900_ai_ci
, whereas
lookups are case-sensitive if the collation is
utf8mb4_0900_as_cs
or
utf8mb4_bin
.
InnoDB
has a relatively short list of
default stopwords, because documents from technical, literary,
and other sources often use short words as keywords or in
significant phrases. For example, you might search for
“to be or not to be” and expect to get a sensible
result, rather than having all those words ignored.
To see the default InnoDB
stopword list,
query the Information Schema
INNODB_FT_DEFAULT_STOPWORD
table.
Press CTRL+C to copymysql> SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_DEFAULT_STOPWORD; +-------+ | value | +-------+ | a | | about | | an | | are | | as | | at | | be | | by | | com | | de | | en | | for | | from | | how | | i | | in | | is | | it | | la | | of | | on | | or | | that | | the | | this | | to | | was | | what | | when | | where | | who | | will | | with | | und | | the | | www | +-------+ 36 rows in set (0.00 sec)
To define your own stopword list for all
InnoDB
tables, define a table with the same
structure as the
INNODB_FT_DEFAULT_STOPWORD
table,
populate it with stopwords, and set the value of the
innodb_ft_server_stopword_table
option to a value in the form
before creating the full-text index. The stopword table must
have a single db_name
/table_name
VARCHAR
column
named value
. The following example
demonstrates creating and configuring a new global stopword
table for InnoDB
.
Press CTRL+C to copy-- Create a new stopword table mysql> CREATE TABLE my_stopwords(value VARCHAR(30)) ENGINE = INNODB; Query OK, 0 rows affected (0.01 sec) -- Insert stopwords (for simplicity, a single stopword is used in this example) mysql> INSERT INTO my_stopwords(value) VALUES ('Ishmael'); Query OK, 1 row affected (0.00 sec) -- Create the table mysql> CREATE TABLE opening_lines ( id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY, opening_line TEXT(500), author VARCHAR(200), title VARCHAR(200) ) ENGINE=InnoDB; Query OK, 0 rows affected (0.01 sec) -- Insert data into the table mysql> INSERT INTO opening_lines(opening_line,author,title) VALUES ('Call me Ishmael.','Herman Melville','Moby-Dick'), ('A screaming comes across the sky.','Thomas Pynchon','Gravity\'s Rainbow'), ('I am an invisible man.','Ralph Ellison','Invisible Man'), ('Where now? Who now? When now?','Samuel Beckett','The Unnamable'), ('It was love at first sight.','Joseph Heller','Catch-22'), ('All this happened, more or less.','Kurt Vonnegut','Slaughterhouse-Five'), ('Mrs. Dalloway said she would buy the flowers herself.','Virginia Woolf','Mrs. Dalloway'), ('It was a pleasure to burn.','Ray Bradbury','Fahrenheit 451'); Query OK, 8 rows affected (0.00 sec) Records: 8 Duplicates: 0 Warnings: 0 -- Set the innodb_ft_server_stopword_table option to the new stopword table mysql> SET GLOBAL innodb_ft_server_stopword_table = 'test/my_stopwords'; Query OK, 0 rows affected (0.00 sec) -- Create the full-text index (which rebuilds the table if no FTS_DOC_ID column is defined) mysql> CREATE FULLTEXT INDEX idx ON opening_lines(opening_line); Query OK, 0 rows affected, 1 warning (1.17 sec) Records: 0 Duplicates: 0 Warnings: 1
Verify that the specified stopword ('Ishmael') does not appear
by querying the Information Schema
INNODB_FT_INDEX_TABLE
table.
By default, words less than 3 characters in length or
greater than 84 characters in length do not appear in an
InnoDB
full-text search index. Maximum
and minimum word length values are configurable using the
innodb_ft_max_token_size
and
innodb_ft_min_token_size
variables. This default behavior does not apply to the ngram
parser plugin. ngram token size is defined by the
ngram_token_size
option.
Press CTRL+C to copymysql> SET GLOBAL innodb_ft_aux_table='test/opening_lines'; Query OK, 0 rows affected (0.00 sec) mysql> SELECT word FROM INFORMATION_SCHEMA.INNODB_FT_INDEX_TABLE LIMIT 15; +-----------+ | word | +-----------+ | across | | all | | burn | | buy | | call | | comes | | dalloway | | first | | flowers | | happened | | herself | | invisible | | less | | love | | man | +-----------+ 15 rows in set (0.00 sec)
To create stopword lists on a table-by-table basis, create
other stopword tables and use the
innodb_ft_user_stopword_table
option to specify the stopword table that you want to use
before you create the full-text index.
The stopword file is loaded and searched using
latin1
if
character_set_server
is
ucs2
, utf16
,
utf16le
, or utf32
.
To override the default stopword list for MyISAM tables, set
the ft_stopword_file
system
variable. (See Section 7.1.8, “Server System Variables”.) The
variable value should be the path name of the file containing
the stopword list, or the empty string to disable stopword
filtering. The server looks for the file in the data directory
unless an absolute path name is given to specify a different
directory. After changing the value of this variable or the
contents of the stopword file, restart the server and rebuild
your FULLTEXT
indexes.
The stopword list is free-form, separating stopwords with any
nonalphanumeric character such as newline, space, or comma.
Exceptions are the underscore character (_
)
and a single apostrophe ('
) which are
treated as part of a word. The character set of the stopword
list is the server's default character set; see
Section 12.3.2, “Server Character Set and Collation”.
The following list shows the default stopwords for
MyISAM
search indexes. In a MySQL source
distribution, you can find this list in the
storage/myisam/ft_static.c
file.
Press CTRL+C to copya's able about above according accordingly across actually after afterwards again against ain't all allow allows almost alone along already also although always am among amongst an and another any anybody anyhow anyone anything anyway anyways anywhere apart appear appreciate appropriate are aren't around as aside ask asking associated at available away awfully be became because become becomes becoming been before beforehand behind being believe below beside besides best better between beyond both brief but by c'mon c's came can can't cannot cant cause causes certain certainly changes clearly co com come comes concerning consequently consider considering contain containing contains corresponding could couldn't course currently definitely described despite did didn't different do does doesn't doing don't done down downwards during each edu eg eight either else elsewhere enough entirely especially et etc even ever every everybody everyone everything everywhere ex exactly example except far few fifth first five followed following follows for former formerly forth four from further furthermore get gets getting given gives go goes going gone got gotten greetings had hadn't happens hardly has hasn't have haven't having he he's hello help hence her here here's hereafter hereby herein hereupon hers herself hi him himself his hither hopefully how howbeit however i'd i'll i'm i've ie if ignored immediate in inasmuch inc indeed indicate indicated indicates inner insofar instead into inward is isn't it it'd it'll it's its itself just keep keeps kept know known knows last lately later latter latterly least less lest let let's like liked likely little look looking looks ltd mainly many may maybe me mean meanwhile merely might more moreover most mostly much must my myself name namely nd near nearly necessary need needs neither never nevertheless new next nine no nobody non none noone nor normally not nothing novel now nowhere obviously of off often oh ok okay old on once one ones only onto or other others otherwise ought our ours ourselves out outside over overall own particular particularly per perhaps placed please plus possible presumably probably provides que quite qv rather rd re really reasonably regarding regardless regards relatively respectively right said same saw say saying says second secondly see seeing seem seemed seeming seems seen self selves sensible sent serious seriously seven several shall she should shouldn't since six so some somebody somehow someone something sometime sometimes somewhat somewhere soon sorry specified specify specifying still sub such sup sure t's take taken tell tends th than thank thanks thanx that that's thats the their theirs them themselves then thence there there's thereafter thereby therefore therein theres thereupon these they they'd they'll they're they've think third this thorough thoroughly those though three through throughout thru thus to together too took toward towards tried tries truly try trying twice two un under unfortunately unless unlikely until unto up upon us use used useful uses using usually value various very via viz vs want wants was wasn't way we we'd we'll we're we've welcome well went were weren't what what's whatever when whence whenever where where's whereafter whereas whereby wherein whereupon wherever whether which while whither who who's whoever whole whom whose why will willing wish with within without won't wonder would wouldn't yes yet you you'd you'll you're you've your yours yourself yourselves zero