MySQL can perform boolean full-text searches using the
IN BOOLEAN MODE modifier. With this modifier,
certain characters have special meaning at the beginning or end
of words in the search string. In the following query, the
+ and - operators indicate
that a word must be present or absent, respectively, for a match
to occur. Thus, the query retrieves all the rows that contain
the word “MySQL” but that do
not contain the word
“YourSQL”:
mysql>SELECT * FROM articles WHERE MATCH (title,body)AGAINST ('+MySQL -YourSQL' IN BOOLEAN MODE);+----+-----------------------+-------------------------------------+ | id | title | body | +----+-----------------------+-------------------------------------+ | 1 | MySQL Tutorial | DBMS stands for DataBase ... | | 2 | How To Use MySQL Well | After you went through a ... | | 3 | Optimizing MySQL | In this tutorial we will show ... | | 4 | 1001 MySQL Tricks | 1. Never run mysqld as root. 2. ... | | 6 | MySQL Security | When configured properly, MySQL ... | +----+-----------------------+-------------------------------------+
In implementing this feature, MySQL uses what is sometimes referred to as implied Boolean logic, in which
+stands forAND-stands forNOT[no operator] implies
OR
Boolean full-text searches have these characteristics:
They do not automatically sort rows in order of decreasing relevance.
InnoDBtables require aFULLTEXTindex on all columns of theMATCH()expression to perform boolean queries. Boolean queries against aMyISAMsearch index can work even without aFULLTEXTindex, although a search executed in this fashion would be quite slow.The minimum and maximum word length full-text parameters apply to
FULLTEXTindexes created using the built-inFULLTEXTparser and MeCab parser plugin.innodb_ft_min_token_sizeandinnodb_ft_max_token_sizeare used forInnoDBsearch indexes.ft_min_word_lenandft_max_word_lenare used forMyISAMsearch indexes.Minimum and maximum word length full-text parameters do not apply to
FULLTEXTindexes created using the ngram parser. ngram token size is defined by thengram_token_sizeoption.The stopword list applies, controlled by
innodb_ft_enable_stopword,innodb_ft_server_stopword_table, andinnodb_ft_user_stopword_tableforInnoDBsearch indexes, andft_stopword_fileforMyISAMones.InnoDBfull-text search does not support the use of multiple operators on a single search word, as in this example:'++apple'. Use of multiple operators on a single search word returns a syntax error to standard out. MyISAM full-text search will successfully process the same search ignoring all operators except for the operator immediately adjacent to the search word.InnoDBfull-text search only supports leading plus or minus signs. For example,InnoDBsupports'+apple'but does not support'apple+'. Specifying a trailing plus or minus sign causesInnoDBto report a syntax error.InnoDBfull-text search does not support the use of a leading plus sign with wildcard ('+*'), a plus and minus sign combination ('+-'), or leading a plus and minus sign combination ('+-apple'). These invalid queries return a syntax error.InnoDBfull-text search does not support the use of the@symbol in boolean full-text searches. The@symbol is reserved for use by the@distanceproximity search operator.They do not use the 50% threshold that applies to
MyISAMsearch indexes.
The boolean full-text search capability supports the following operators:
+A leading or trailing plus sign indicates that this word must be present in each row that is returned.
InnoDBonly supports leading plus signs.-A leading or trailing minus sign indicates that this word must not be present in any of the rows that are returned.
InnoDBonly supports leading minus signs.Note: The
-operator acts only to exclude rows that are otherwise matched by other search terms. Thus, a boolean-mode search that contains only terms preceded by-returns an empty result. It does not return “all rows except those containing any of the excluded terms.”(no operator)
By default (when neither
+nor-is specified), the word is optional, but the rows that contain it are rated higher. This mimics the behavior ofMATCH() ... AGAINST()without theIN BOOLEAN MODEmodifier.@distanceThis operator works on
InnoDBtables only. It tests whether two or more words all start within a specified distance from each other, measured in words. Specify the search words within a double-quoted string immediately before the@operator, for example,distanceMATCH(col1) AGAINST('"word1 word2 word3" @8' IN BOOLEAN MODE)> <These two operators are used to change a word's contribution to the relevance value that is assigned to a row. The
>operator increases the contribution and the<operator decreases it. See the example following this list.( )Parentheses group words into subexpressions. Parenthesized groups can be nested.
~A leading tilde acts as a negation operator, causing the word's contribution to the row's relevance to be negative. This is useful for marking “noise” words. A row containing such a word is rated lower than others, but is not excluded altogether, as it would be with the
-operator.*The asterisk serves as the truncation (or wildcard) operator. Unlike the other operators, it is appended to the word to be affected. Words match if they begin with the word preceding the
*operator.If a word is specified with the truncation operator, it is not stripped from a boolean query, even if it is too short or a stopword. Whether a word is too short is determined from the
innodb_ft_min_token_sizesetting forInnoDBtables, orft_min_word_lenforMyISAMtables. These options are not applicable toFULLTEXTindexes that use the ngram parser.The wildcarded word is considered as a prefix that must be present at the start of one or more words. If the minimum word length is 4, a search for
'+could return fewer rows than a search forword+the*''+, because the second query ignores the too-short search termword+the'the."A phrase that is enclosed within double quote (“
"”) characters matches only rows that contain the phrase literally, as it was typed. The full-text engine splits the phrase into words and performs a search in theFULLTEXTindex for the words. Nonword characters need not be matched exactly: Phrase searching requires only that matches contain exactly the same words as the phrase and in the same order. For example,"test phrase"matches"test, phrase".If the phrase contains no words that are in the index, the result is empty. The words might not be in the index because of a combination of factors: if they do not exist in the text, are stopwords, or are shorter than the minimum length of indexed words.
The following examples demonstrate some search strings that use boolean full-text operators:
'apple banana'Find rows that contain at least one of the two words.
'+apple +juice'Find rows that contain both words.
'+apple macintosh'Find rows that contain the word “apple”, but rank rows higher if they also contain “macintosh”.
'+apple -macintosh'Find rows that contain the word “apple” but not “macintosh”.
'+apple ~macintosh'Find rows that contain the word “apple”, but if the row also contains the word “macintosh”, rate it lower than if row does not. This is “softer” than a search for
'+apple -macintosh', for which the presence of “macintosh” causes the row not to be returned at all.'+apple +(>turnover <strudel)'Find rows that contain the words “apple” and “turnover”, or “apple” and “strudel” (in any order), but rank “apple turnover” higher than “apple strudel”.
'apple*'Find rows that contain words such as “apple”, “apples”, “applesauce”, or “applet”.
'"some words"'Find rows that contain the exact phrase “some words” (for example, rows that contain “some words of wisdom” but not “some noise words”). Note that the “
"” characters that enclose the phrase are operator characters that delimit the phrase. They are not the quotation marks that enclose the search string itself.
InnoDB full-text search is
modeled on the
Sphinx full-text
search engine, and the algorithms used are based on
BM25
and
TF-IDF
ranking algorithms. For these reasons, relevancy rankings for
InnoDB boolean full-text search may differ
from MyISAM relevancy rankings.
InnoDB uses a variation of the “term
frequency-inverse document frequency”
(TF-IDF) weighting system to rank a
document's relevance for a given full-text search query. The
TF-IDF weighting is based on how frequently
a word appears in a document, offset by how frequently the
word appears in all documents in the collection. In other
words, the more frequently a word appears in a document, and
the less frequently the word appears in the document
collection, the higher the document is ranked.
How Relevancy Ranking is Calculated
The term frequency (TF) value is the number
of times that a word appears in a document. The inverse
document frequency (IDF) value of a word is
calculated using the following formula, where
total_records is the number of records in
the collection, and matching_records is the
number of records that the search term appears in.
${IDF} = log10( ${total_records} / ${matching_records} ) When a document contains a word multiple times, the IDF value is multiplied by the TF value:
${TF} * ${IDF}
Using the TF and IDF
values, the relevancy ranking for a document is calculated
using this formula:
${rank} = ${TF} * ${IDF} * ${IDF}The formula is demonstrated in the following examples.
Relevancy Ranking for a Single Word Search
This example demonstrates the relevancy ranking calculation for a single-word search.
mysql> CREATE TABLE articles (
id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
title VARCHAR(200),
body TEXT,
FULLTEXT (title,body)
) ENGINE=InnoDB;
Query OK, 0 rows affected (1.04 sec)
mysql> INSERT INTO articles (title,body) VALUES
('MySQL Tutorial','This database tutorial ...'),
("How To Use MySQL",'After you went through a ...'),
('Optimizing Your Database','In this database tutorial ...'),
('MySQL vs. YourSQL','When comparing databases ...'),
('MySQL Security','When configured properly, MySQL ...'),
('Database, Database, Database','database database database'),
('1001 MySQL Tricks','1. Never run mysqld as root. 2. ...'),
('MySQL Full-Text Indexes', 'MySQL fulltext indexes use a ..');
Query OK, 8 rows affected (0.06 sec)
Records: 8 Duplicates: 0 Warnings: 0
mysql> SELECT id, title, body, MATCH (title,body) AGAINST ('database' IN BOOLEAN MODE)
AS score FROM articles ORDER BY score DESC;
+----+------------------------------+-------------------------------------+---------------------+
| id | title | body | score |
+----+------------------------------+-------------------------------------+---------------------+
| 6 | Database, Database, Database | database database database | 1.0886961221694946 |
| 3 | Optimizing Your Database | In this database tutorial ... | 0.36289870738983154 |
| 1 | MySQL Tutorial | This database tutorial ... | 0.18144935369491577 |
| 2 | How To Use MySQL | After you went through a ... | 0 |
| 4 | MySQL vs. YourSQL | When comparing databases ... | 0 |
| 5 | MySQL Security | When configured properly, MySQL ... | 0 |
| 7 | 1001 MySQL Tricks | 1. Never run mysqld as root. 2. ... | 0 |
| 8 | MySQL Full-Text Indexes | MySQL fulltext indexes use a .. | 0 |
+----+------------------------------+-------------------------------------+---------------------+
8 rows in set (0.00 sec)
There are 8 records in total, with 3 that match the
“database” search term. The first record
(id 6) contains the search term 6 times and
has a relevancy ranking of
1.0886961221694946. This ranking value is
calculated using a TF value of 6 (the
“database” search term appears 6 times in record
id 6) and an IDF value
of 0.42596873216370745, which is calculated as follows (where
8 is the total number of records and 3 is the number of
records that the search term appears in):
${IDF} = log10( 8 / 3 ) = 0.42596873216370745
The TF and IDF values
are then entered into the ranking formula:
${rank} = ${TF} * ${IDF} * ${IDF}Performing the calculation in the MySQL command-line client returns a ranking value of 1.088696164686938.
mysql> SELECT 6*log10(8/3)*log10(8/3); +-------------------------+ | 6*log10(8/3)*log10(8/3) | +-------------------------+ | 1.088696164686938 | +-------------------------+ 1 row in set (0.00 sec)
You may notice a slight difference in the ranking values
returned by the SELECT ... MATCH ...
AGAINST statement and the MySQL command-line
client (1.0886961221694946 versus
1.088696164686938). The difference is due
to how the casts between integers and floats/doubles are
performed internally by InnoDB (along
with related precision and rounding decisions), and how they
are performed elsewhere, such as in the MySQL command-line
client or other types of calculators.
Relevancy Ranking for a Multiple Word Search
This example demonstrates the relevancy ranking calculation
for a multiple-word full-text search based on the
articles table and data used in the
previous example.
If you search on more than one word, the relevancy ranking value is a sum of the relevancy ranking value for each word, as shown in this formula:
${rank} = ${TF} * ${IDF} * ${IDF} + ${TF} * ${IDF} * ${IDF}Performing a search on two terms ('mysql tutorial') returns the following results:
mysql> SELECT id, title, body, MATCH (title,body) AGAINST ('mysql tutorial' IN BOOLEAN MODE)
AS score FROM articles ORDER BY score DESC;
+----+------------------------------+-------------------------------------+----------------------+
| id | title | body | score |
+----+------------------------------+-------------------------------------+----------------------+
| 1 | MySQL Tutorial | This database tutorial ... | 0.7405621409416199 |
| 3 | Optimizing Your Database | In this database tutorial ... | 0.3624762296676636 |
| 5 | MySQL Security | When configured properly, MySQL ... | 0.031219376251101494 |
| 8 | MySQL Full-Text Indexes | MySQL fulltext indexes use a .. | 0.031219376251101494 |
| 2 | How To Use MySQL | After you went through a ... | 0.015609688125550747 |
| 4 | MySQL vs. YourSQL | When comparing databases ... | 0.015609688125550747 |
| 7 | 1001 MySQL Tricks | 1. Never run mysqld as root. 2. ... | 0.015609688125550747 |
| 6 | Database, Database, Database | database database database | 0 |
+----+------------------------------+-------------------------------------+----------------------+
8 rows in set (0.00 sec)
In the first record (id 8), 'mysql' appears
once and 'tutorial' appears twice. There are six matching
records for 'mysql' and two matching records for 'tutorial'.
The MySQL command-line client returns the expected ranking
value when inserting these values into the ranking formula for
a multiple word search:
mysql> SELECT (1*log10(8/6)*log10(8/6)) + (2*log10(8/2)*log10(8/2)); +-------------------------------------------------------+ | (1*log10(8/6)*log10(8/6)) + (2*log10(8/2)*log10(8/2)) | +-------------------------------------------------------+ | 0.7405621541938003 | +-------------------------------------------------------+ 1 row in set (0.00 sec)
The slight difference in the ranking values returned by the
SELECT ... MATCH ... AGAINST statement
and the MySQL command-line client is explained in the
preceding example.
E.g.:
SELECT * FROM quotes_table WHERE MATCH (quote) AGAINST ('"mangé"' IN BOOLEAN MODE)
For multiword searches:
SELECT * FROM quotes_table MATCH (quote) AGAINST ('"mangé" "pensé"' IN BOOLEAN MODE)
SELECT * FROM quotes_table MATCH (quote) AGAINST ('+"mangé" +"pensé"' IN BOOLEAN MODE)
Although the double quotes are intended to enable phrase searching, just like any web search engine for example, you can also use them to signify single words where accents and other diacritics matter.
The only drawback to this method seems to be that the asterisk operator is mutually exclusive with the double quote. Or I just haven't been able to combine both effectively.
Words that are shorter than the minimum word length (by default, words with up to 3 characters) are sometimes taken into consideration when you search for phrases, but sometimes not!
Example 1:
A search for the phrase "the creation" will find all records that really contain this phrase, and only those. So, a record containing only "la creation du monde", even without the accent aigu on the e in creation, won't be found. This is just fine and what one would expect.
Example 2: A search for the phrase "let it be" won't find any record, not even records containing something like "The Beatles: Let It Be". According to the MySQL team, this is not a bug.
I personally find it very counterintuitive to sometimes take short words into consideration for phrase searches, but only if there is at least one properly long word in the search phrase.
$query= "SELECT code, category, header, date FROM articles WHERE MATCH (text,header,summary) AGAINST ('" . stripslashes (str_replace (""", "\"", ($_POST['keywords']))) . "' IN BOOLEAN MODE)";
PHP, or some setups or with some browsers, convert double quotes from POST data to their HTML-equivalents even without being asked to do that. The above will fix the issue. Stripslashes() is optional, I prefer to keep it in to keep things looking clean, though the \" doesn't seem to break the boolean literal search.
SELECT id, text, MATCH (text) AGAINST ('word1 word2 word3' in boolean mode)
AS score FROM table1
WHERE MATCH (text) AGAINST ('word1 word2 word3' in boolean mode) order by score desc;
SELECT id, text, MATCH (text) AGAINST ('word1 word2 word3' in boolean mode)
AS score FROM table1
WHERE MATCH (text) AGAINST ('word1 word2 word3' in boolean mode) order by score desc;
I tried this and it failed to work, i.e. all the scores turned out to be '1' even though the number of matches differed from record to record.
SELECT *, ( (1.3 * (MATCH(title) AGAINST ('+term +term2' IN BOOLEAN MODE))) + (0.6 * (MATCH(text) AGAINST ('+term +term2' IN BOOLEAN MODE))) ) AS relevance FROM [table_name] WHERE ( MATCH(title,text) AGAINST ('+term +term2' IN BOOLEAN MODE) ) HAVING relevance > 0 ORDER BY relevance DESC;
Here we artificially manipulate the relevancy score to give title more weight by multiplying by the constant 1.3. In the above query, it doesn't seem to matter whether I have 3 fulltext indexes or just one comprising the title and text columns. From my testing, the results appear to be the same.
In my tests it would seem that the score returned is an integer equal to the number of words matched. So if you match on 3 words the scores will range from 1 to 3. If you match only on one word, or only one word is matched in any document, then the scores would all be 1.
To enhance sorting of the results in boolean mode you can do the following:
SELECT id, text, MATCH (text) AGAINST ('word1 word2 word3')
AS score FROM table1
WHERE MATCH (text) AGAINST ('+word1 +word2 +word3' in boolean mode) order by score desc;
Using the first MATCH() we get the score in non-boolean search mode (more distinctive). The second MATCH() ensures we really get back only the results we want (with all 3 words). If you want to do 'any of the words' search only, it's better to use non-boolean search instead (unless you are using boolean in order to get rid of 50% treshold limit).
It seems that non-boolean mode returns a "real" relevancy (based on how often each term was found I presume). In boolean mode it only returns how many terms were found. To me, this is not really relevancy. For instance, if searching on 2 terms, one result may have 20 occurrences of each term and another may have only 1 occurrence of each word, yet they will both return "2" as their relevance.
select products_id,match(products_model) against ('printer' ) as Relevance
from products
where match(products_model) against ('+"printer"' in boolean mode)
if You want to combine the phrase search with the *, simply search for the words separately and apply a having clause.
For example:
If You wish to find `bird cathcing`, `bird cathcer`, `bird cathers`
try this
SELECT column, MATCH( column ) AGAINST ('bird catch' IN BOOLEAN MODE) AS rank
FROM mytable
WHERE MATCH( column ) AGAINST ('bird catch' IN BOOLEAN MODE) > 0
HAVING column LIKE '%bird catch%'
ORDER BY rank DESC
I know the HAVING clause is slow, but it is only allpied to the rows that match the search criteira.
Have'n tested on big tables, but I have a feeling it works just fine.
SELECT *, MATCH(post_content, post_title) AGAINST('string') AS `score` FROM posts MATCH(post_content, post_title) AGAINST('string' IN BOOLEAN MODE) ORDER BY score DESC LIMIT 10
The boolean mode after the from statement automatically pulls out the 0's and then the Match statement in the SELECT clause allows me to get the relevance score so I can sort by it.
What I did was break the query up into two sets of words, one set contained all the words that were >= ft_min_word_len, the other set contained all the shorter words. I did a fulltext search on the words that met the fulltext search length requirement and augmented it with an AND clause that used LIKE '%<shortword1>%'. MySQL uses the fulltext index to narrow down the results and then applies the LIKE conditions, so it stays fast.
This way you don't need to change your ft_min_word_len to a smaller number, which will make your indexes bigger.
Here's the regex I used to break things up:
\b(\w{4,})\b
where ft_min_word_len = 4, that will match all words of 4 or more letters.
Example
Search for the string "axle hub nut" will result in:
WHERE MATCH (col) AGAINST ("+axle*" IN BOOLEAN MODE) AND col LIKE '%hub%' AND col LIKE '%nut%'
SELECT MATCH (x) AGAINST ('word1 word2' IN BOOLEAN MODE) AS relevance
The returned relevance is 0,1 or 2. 0 = no match, 1 = one of the words is matched (word1 or word2), 2 = both word are matched.
When using eg. '+word1 word2 word3' it returns a floating point number (double) which is the relevance (all returned rows contain word1, some are more relevant (they return word2 and word3, others less..they contain word1 and (word2 or word3).. others even less..containing only word1 (relevance = 1).
SELECT id,
store,
name,
(((MATCH(name)
AGAINST(?) * 1.2) +
MATCH(description, keywords)
AGAINST (?) +
((MATCH(creators)
AGAINST (?) * 1.2)) / 3) +
(((store IN ('xfx','w3d')) * .12) * ((store IN ('iv', '3da', 'vp')) * .1))) *
MATCH(description,keywords,requirements,creators,name)
AGAINST(? IN BOOLEAN MODE)
AS sort_rel
FROM prod_text
WHERE MATCH(description,keywords,requirements,creators,name)
AGAINST(? IN BOOLEAN MODE)
AND store != 'pp'
HAVING sort_rel > 0.2
ORDER BY sort_rel DESC
All placeholders (?s) take the search terms entered by the user in this case.
Instead of enabling a smaller size, what I do is this:
I preprocess against a list of three letter words I want to be matchable. For any row I find, I insert a different string into the indexed text the same number of times (this column need not actually be the one holding returned text mind you, it can be a duplicate column used only for indexing and searching).
When a user searches, I preprocess his or her search query in the same way, replacing any of the words I want matchable with the longer string.
For instance, if I wanted the search to be able to match 'GUN' I would search (using LIKE) for all rows matching and get back the results into a perlscript:
my $check_for_tlw_q = <<"EOF";
SELECT id, fulltext_indexed
FROM some_table
WHERE fulltext_displayed RLIKE '(^|[[:blank:]])GUN([[:blank:]]|$)'
AND fulltext_indexed NOT RLIKE '(^|[[:blank:]])TLWGUNTLW([[:blank:]]|$)'
EOF
my $check_for_tlw = $dbh->prepare($check_for_tlw_q);
my $update_tlw_q = <<<"EOF";
UPDATE some_table
SET fulltext_indexed = ?
WHERE id = ?
EOF
my $update_tlw = $dbh->prepare($update_tlw_q);
$check_for_tlw->execute;
while (my $row = $check_for_tlw->fetchrow_hashref) {
$row->{fulltext_indexed} =~ s/\bgun\b/TLWGUNTLW/igsm;
$update_tlw->execute($row->{fulltext_indexed}, $row->{id});
}
Then when a user searches for 'machine gun' the search is substituted to be 'machine TLWGUNTLW' behind the scenes. This has the effect of making 'gun' matchable without enabling words like 'the' and 'and' and 'les' and 'how' and 'brb' and thus not bogging down the index except where necessary.
This is impportant specially if you have more than two fields to match against, field artist, field album, field producer, you want to list with artist then album then producer.
1. Build your search table
CREATE TABLE my_search
(
id int(10),
category smallint(5),
score smallint(5),
text_data text
)
ENGINE=MyISAM
DEFAULT CHARSET=latin1
ROW_FORMAT=COMPRESSED;
2. Fill the table with data
Fill the search table with data giving a greater score value, to more important data.
For example with a product database:
Insert the product names, make and model into the "text_data" column and give it a score of 5 (or similarly higher score). Insert the product specification and give this a score of 3 (medium value) and then insert the product description and any other related text with a score of 1 (low value).
Therefore, in this example, we insert 3 rows for the three items of data with differing scores relating to the data's importance.
3. Create indexes
CREATE FULLTEXT INDEX idx_1 ON my_search (text_data);
CREATE INDEX idx_2 ON my_search (id);
You may wish to create indexes for the other columns, such as category, if they are included in your proposed SELECT statement.
4. Execute your SELECT
The following Select statement will return your search results sorted with matches against product name towards the top. If the match is against product name *and* specification then this has an even higher score (relevance) and appears higher in the list.
SELECT id, sum(score)
FROM my_search
WHERE MATCH(text_data) AGAINST ('search for this text' IN BOOLEAN MODE)
GROUP BY id
ORDER BY sum(score) DESC;
--
As I said at the beginning, this is not a perfectly accurate list by *real* computed relevance. However, this may provide help in some circumstances with extremely large databases (mine has 2.2 million rows).
SHOW VARIABLES LIKE 'collation%'
Variable_name Value
collation_connection utf8_unicode_ci
collation_database latin1_general_cs
collation_server latin1_swedish_ci
... meant the above example did not return both rows but instead only the row which contained the 'database' string.
I was able to resolve this by explicilty setting the character set/collation to one that that is case in-sensitive
CREATE TABLE articles(
id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY ,
title VARCHAR( 200 ) ,
body TEXT,
FULLTEXT (
title,
body
)
) DEFAULT CHARACTER SET latin1 COLLATE latin1_general_ci;
Straightforward query:
SELECT id, article FROM mytable WHERE MATCH(article) AGAINST ('+word1 +word2' IN BOOLEAN MODE)
does return result, but is very slow.
At the same time, another query
SELECT id, article FROM mytable WHERE MATCH(article) AGAINST ('+word1' IN BOOLEAN MODE) AND MATCH(article) AGAINST ('+word2' IN BOOLEAN MODE)
returns the same result, but could be 100 times faster...
This is true for version 4.1
Or, you can use Markus' example. Don't forget to sanitize user input :-)