サーバー文字セットおよび照合順序
(character_set_server
および
collation_server
システム変数の値)
を使用すると、全文クエリー用のストップワードリストがロードおよび検索されます。全文インデックス作成または検索で使用されるストップワードファイルまたはカラムに、character_set_server
または collation_server
とは異なる文字セットまたは照合順序が含まれている場合は、ストップワード検索で誤ったヒットまたはミスが発生する可能性があります。
ストップワード検索で大文字と小文字が区別されるかどうかは、サーバー照合順序によって異なります。たとえば、照合順序が
latin1_swedish_ci
の場合は検索で大文字と小文字が区別されませんが、照合順序が
latin1_general_cs
または
latin1_bin
の場合は検索で大文字と小文字が区別されます。
InnoDB 検索インデックスのストップワード
技術的、文学的、およびその他のソースからのドキュメントでは、キーワードとしてまたは重要なフレーズで短い単語が使用されることが多いため、InnoDB
ではデフォルトのストップワードリストが比較的短くなります。たとえば、「to
be or not to be」
を検索し、これらの単語がすべて無視されるのではなく、適切な結果が取得されることを期待するとします。
デフォルトの InnoDB
ストップワードリストを確認するには、INFORMATION_SCHEMA.INNODB_FT_DEFAULT_STOPWORD
テーブルを問い合わせます。
mysql> SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_DEFAULT_STOPWORD;
+-------+
| value |
+-------+
| a |
| about |
| an |
| are |
| as |
| at |
| be |
| by |
| com |
| de |
| en |
| for |
| from |
| how |
| i |
| in |
| is |
| it |
| la |
| of |
| on |
| or |
| that |
| the |
| this |
| to |
| was |
| what |
| when |
| where |
| who |
| will |
| with |
| und |
| the |
| www |
+-------+
36 rows in set (0.00 sec)
すべての InnoDB
テーブルで独自のストップワードリストを定義するには、INNODB_FT_DEFAULT_STOPWORD
テーブルと同じ構造を持つテーブルを定義し、それにストップワードを移入し、innodb_ft_server_stopword_table
オプションの値を
形式の値に設定してから、全文インデックスを作成します。ストップワードテーブルには、db_name
/table_name
value
という名前の単一の
VARCHAR
カラムが含まれている必要があります。次の例では、InnoDB
用に新しいグローバルストップワードテーブルを作成および構成するよう実演します。
-- Create a new stopword table
mysql> CREATE TABLE my_stopwords(value VARCHAR(30)) ENGINE = INNODB;
Query OK, 0 rows affected (0.01 sec)
-- Insert stopwords (for simplicity, a single stopword is used in this example)
mysql> INSERT INTO my_stopwords(value) VALUES ('Ishmael');
Query OK, 1 row affected (0.00 sec)
-- Create the table
mysql> CREATE TABLE opening_lines (
id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
opening_line TEXT(500),
author VARCHAR(200),
title VARCHAR(200)
) ENGINE=InnoDB;
Query OK, 0 rows affected (0.01 sec)
-- Insert data into the table
mysql> INSERT INTO opening_lines(opening_line,author,title) VALUES
('Call me Ishmael.','Herman Melville','Moby-Dick'),
('A screaming comes across the sky.','Thomas Pynchon','Gravity\'s Rainbow'),
('I am an invisible man.','Ralph Ellison','Invisible Man'),
('Where now? Who now? When now?','Samuel Beckett','The Unnamable'),
('It was love at first sight.','Joseph Heller','Catch-22'),
('All this happened, more or less.','Kurt Vonnegut','Slaughterhouse-Five'),
('Mrs. Dalloway said she would buy the flowers herself.','Virginia Woolf','Mrs. Dalloway'),
('It was a pleasure to burn.','Ray Bradbury','Fahrenheit 451');
Query OK, 8 rows affected (0.00 sec)
Records: 8 Duplicates: 0 Warnings: 0
-- Set the innodb_ft_server_stopword_table option to the new stopword table
mysql> SET GLOBAL innodb_ft_server_stopword_table = 'test/my_stopwords';
Query OK, 0 rows affected (0.00 sec)
-- Create the full-text index (which rebuilds the table if no FTS_DOC_ID column is defined)
mysql> CREATE FULLTEXT INDEX idx ON opening_lines(opening_line);
Query OK, 0 rows affected, 1 warning (1.17 sec)
Records: 0 Duplicates: 0 Warnings: 1
INFORMATION_SCHEMA.INNODB_FT_INDEX_TABLE
で単語を問い合わせて、指定したストップワード
('Ishmael') が表示されないことを確認します。
デフォルトでは、長さが 3
文字よりも少ない単語または 84
文字よりも多い単語は、InnoDB
の全文検索インデックスに表示されません。単語の最大長および最小長の値は、innodb_ft_max_token_size
および
innodb_ft_min_token_size
変数を使用して構成できます。
mysql> SET GLOBAL innodb_ft_aux_table='test/opening_lines';
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT word FROM INFORMATION_SCHEMA.INNODB_FT_INDEX_TABLE LIMIT 15;
+-----------+
| word |
+-----------+
| across |
| all |
| burn |
| buy |
| call |
| comes |
| dalloway |
| first |
| flowers |
| happened |
| herself |
| invisible |
| less |
| love |
| man |
+-----------+
15 rows in set (0.00 sec)
ストップワードリストをテーブルごとに作成するには、その他のストップワードテーブルを作成し、innodb_ft_user_stopword_table
オプションを使用して使用されるストップワードテーブルを指定してから、全文インデックスを作成します。
MyISAM 検索インデックスのストップワード
MySQL 5.6
では、character_set_server
が
ucs2
、utf16
、utf16le
、または
utf32
の場合、ストップワードファイルが
latin1
を使用してロードおよび検索されます。
MyISAM
テーブル用のデフォルトのストップワードリストをオーバーライドするには、ft_stopword_file
システム変数を設定します。(セクション5.1.4「サーバーシステム変数」を参照してください。)変数の値は、ストップワードリストを含むファイルのパス名、またはストップワードのフィルタ処理が無効になる空の文字列になるようにしてください。サーバーは、別のディレクトリを指定する絶対パス名が指定されないかぎり、データディレクトリ内のファイルを検索します。この変数の値またはストップワードファイルの内容を変更したら、サーバーを再起動し、FULLTEXT
インデックスを再構築してください。
ストップワードリストは自由形式で、改行、空白、カンマなどの英数字以外の文字でストップワードが区切られます。例外として、下線文字
(「_
」)
と単一アポストロフィー
(「'
」)
は単語の一部として処理されます。ストップワードリストの文字セットは、サーバーのデフォルト文字セットです。セクション10.1.3.1「サーバー文字セットおよび照合順序」を参照してください。
次の表には、MyISAM
検索インデックスのデフォルトのストップワードリストを示します。このリストは、MySQL
ソース配布の
storage/myisam/ft_static.c
ファイルで検索できます。
a's | able | about | above | according |
accordingly | across | actually | after | afterwards |
again | against | ain't | all | allow |
allows | almost | alone | along | already |
also | although | always | am | among |
amongst | an | and | another | any |
anybody | anyhow | anyone | anything | anyway |
anyways | anywhere | apart | appear | appreciate |
appropriate | are | aren't | around | as |
aside | ask | asking | associated | at |
available | away | awfully | be | became |
because | become | becomes | becoming | been |
before | beforehand | behind | being | believe |
below | beside | besides | best | better |
between | beyond | both | brief | but |
by | c'mon | c's | came | can |
can't | cannot | cant | cause | causes |
certain | certainly | changes | clearly | co |
com | come | comes | concerning | consequently |
consider | considering | contain | containing | contains |
corresponding | could | couldn't | course | currently |
definitely | described | despite | did | didn't |
different | do | does | doesn't | doing |
don't | done | down | downwards | during |
each | edu | eg | eight | either |
else | elsewhere | enough | entirely | especially |
et | etc | even | ever | every |
everybody | everyone | everything | everywhere | ex |
exactly | example | except | far | few |
fifth | first | five | followed | following |
follows | for | former | formerly | forth |
four | from | further | furthermore | get |
gets | getting | given | gives | go |
goes | going | gone | got | gotten |
greetings | had | hadn't | happens | hardly |
has | hasn't | have | haven't | having |
he | he's | hello | help | hence |
her | here | here's | hereafter | hereby |
herein | hereupon | hers | herself | hi |
him | himself | his | hither | hopefully |
how | howbeit | however | i'd | i'll |
i'm | i've | ie | if | ignored |
immediate | in | inasmuch | inc | indeed |
indicate | indicated | indicates | inner | insofar |
instead | into | inward | is | isn't |
it | it'd | it'll | it's | its |
itself | just | keep | keeps | kept |
know | known | knows | last | lately |
later | latter | latterly | least | less |
lest | let | let's | like | liked |
likely | little | look | looking | looks |
ltd | mainly | many | may | maybe |
me | mean | meanwhile | merely | might |
more | moreover | most | mostly | much |
must | my | myself | name | namely |
nd | near | nearly | necessary | need |
needs | neither | never | nevertheless | new |
next | nine | no | nobody | non |
none | noone | nor | normally | not |
nothing | novel | now | nowhere | obviously |
of | off | often | oh | ok |
okay | old | on | once | one |
ones | only | onto | or | other |
others | otherwise | ought | our | ours |
ourselves | out | outside | over | overall |
own | particular | particularly | per | perhaps |
placed | please | plus | possible | presumably |
probably | provides | que | quite | qv |
rather | rd | re | really | reasonably |
regarding | regardless | regards | relatively | respectively |
right | said | same | saw | say |
saying | says | second | secondly | see |
seeing | seem | seemed | seeming | seems |
seen | self | selves | sensible | sent |
serious | seriously | seven | several | shall |
she | should | shouldn't | since | six |
so | some | somebody | somehow | someone |
something | sometime | sometimes | somewhat | somewhere |
soon | sorry | specified | specify | specifying |
still | sub | such | sup | sure |
t's | take | taken | tell | tends |
th | than | thank | thanks | thanx |
that | that's | thats | the | their |
theirs | them | themselves | then | thence |
there | there's | thereafter | thereby | therefore |
therein | theres | thereupon | these | they |
they'd | they'll | they're | they've | think |
third | this | thorough | thoroughly | those |
though | three | through | throughout | thru |
thus | to | together | too | took |
toward | towards | tried | tries | truly |
try | trying | twice | two | un |
under | unfortunately | unless | unlikely | until |
unto | up | upon | us | use |
used | useful | uses | using | usually |
value | various | very | via | viz |
vs | want | wants | was | wasn't |
way | we | we'd | we'll | we're |
we've | welcome | well | went | were |
weren't | what | what's | whatever | when |
whence | whenever | where | where's | whereafter |
whereas | whereby | wherein | whereupon | wherever |
whether | which | while | whither | who |
who's | whoever | whole | whom | whose |
why | will | willing | wish | with |
within | without | won't | wonder | would |
wouldn't | yes | yet | you | you'd |
you'll | you're | you've | your | yours |
yourself | yourselves | zero |