When training a recommendation model, there are two mandatory
options: users
and
items
.
For content-based recommendation models, the
item_metadata
option is also required,
which specifies the table that has item descriptions. This
table must only have two columns: one corresponding to the
item_id
, and the other with a
TEXT
data type (TINYTEXT, TEXT, MEDIUMTEXT,
LONGTEXT) that has the description of the item.
Run the ML_TRAIN
routine to
create a recommendation model, and use the following
JSON
options
:
task
:recommendation
: Specifies the machine learning task.users
: Specifies the column name corresponding to the user ids.items
: Specifies the column name corresponding to the item ids.feedback
: The type of feedback for a recommendation model,explicit
, the default, orimplicit
.feedback_threshold
: The feedback threshold for a recommendation model that uses implicit feedback. It represents the threshold required to be considered positive feedback. For example, if numerical data records the number of times users interact with an item, you might set a threshold with a value of 3. This means users would need to interact with an item more than three times to be considered positive feedback.-
item_metadata
: To be used with content-based recommendation models using implicit feedback. It is a JSON object that can have thetable_name
option as a key, which specifies the table that has item descriptions.table_name
: To be used with theitem_metadata
option. It specifies the table name that has item descriptions for content-based recommendation models. It must be a string in a fully qualified format (schema_name.table_name) that specifies the table name.
If the users
or items
column contains NULL
values, the
corresponding rows will be dropped and will not be considered
during training.
See Section 3.5, “Training a Model”, and for full details
of all the options
, see
ML_TRAIN
.
-
An
ML_TRAIN
example that specifies therecommendation
task type.mysql> CALL sys.ML_TRAIN('table_train', 'target_column_feature', JSON_OBJECT('task', 'recommendation', 'users', 'user_column_feature', 'items', 'item_column_feature'), @model);
-
An
ML_TRAIN
example that specifies theSVD
model type.mysql> CALL sys.ML_TRAIN('mlcorpus.foursquare_NYC_train', 'rating', JSON_OBJECT('task', 'recommendation', 'users', 'user_id', 'items', 'item_id', 'model_list', JSON_ARRAY('SVD')), @model); Query OK, 0 rows affected (11.31 sec) mysql> SELECT model_type FROM ML_SCHEMA_root.MODEL_CATALOG WHERE model_handle=@model; +------------+ | model_type | +------------+ | SVD | +------------+ 1 row in set (0.00 sec)
-
An
ML_TRAIN
example that specifies theSVDpp
model type.mysql> CALL sys.ML_TRAIN('mlcorpus.foursquare_NYC_train', 'rating', JSON_OBJECT('task', 'recommendation', 'users', 'user_id', 'items', 'item_id', 'model_list', JSON_ARRAY('SVDpp')), @model); Query OK, 0 rows affected (13.97 sec) mysql> SELECT model_type FROM ML_SCHEMA_root.MODEL_CATALOG WHERE model_handle=@model; +------------+ | model_type | +------------+ | SVDpp | +------------+ 1 row in set (0.00 sec)
-
An
ML_TRAIN
example that specifies theNMF
model type.mysql> CALL sys.ML_TRAIN('mlcorpus.foursquare_NYC_train', 'rating', JSON_OBJECT('task', 'recommendation', 'users', 'user_id', 'items', 'item_id', 'model_list', JSON_ARRAY('NMF')), @model); Query OK, 0 rows affected (12.28 sec) mysql> SELECT model_type FROM ML_SCHEMA_root.MODEL_CATALOG WHERE model_handle=@model; +------------+ | model_type | +------------+ | NMF | +------------+ 1 row in set (0.00 sec)
-
An
ML_TRAIN
example that specifies three models for themodel_list
option.mysql> SET @allowed_models = JSON_ARRAY('SVD', 'SVDpp', 'NMF'); mysql> CALL sys.ML_TRAIN('mlcorpus.foursquare_NYC_train', 'rating', JSON_OBJECT('task', 'recommendation', 'users', 'user_id', 'items', 'item_id', 'model_list', CAST(@allowed_models AS JSON)), @model); Query OK, 0 rows affected (14.88 sec) mysql> SELECT model_type FROM ML_SCHEMA_root.MODEL_CATALOG WHERE model_handle=@model; +------------+ | model_type | +------------+ | SVD | +------------+ 1 row in set (0.00 sec)
-
An
ML_TRAIN
example that specifies five models for theexclude_model_list
option.mysql> SET @exclude_models= JSON_ARRAY('NormalPredictor', 'Baseline', 'SlopeOne', 'CoClustering', 'SVD'); mysql> CALL sys.ML_TRAIN('mlcorpus.foursquare_NYC_train', 'rating', JSON_OBJECT('task', 'recommendation', 'users', 'user_id', 'items', 'item_id', 'exclude_model_list', CAST(@exclude_models AS JSON)), @model); Query OK, 0 rows affected (14.71 sec) mysql> SELECT model_type FROM ML_SCHEMA_root.MODEL_CATALOG WHERE model_handle=@model; +------------+ | model_type | +------------+ | SVDpp | +------------+ 1 row in set (0.00 sec)
-
An
ML_TRAIN
example that specifies therecommendation
task with implicit feedback.mysql> SELECT * FROM training_table LIMIT 10; +---------+---------+--------+ | user_id | item_id | rating | +---------+---------+--------+ | 836 | 226 | 1 | | 3951 | 14918 | 1 | | 1048 | 2498 | 1 | | 4312 | 2559 | 1 | | 4882 | 12930 | 1 | | 3226 | 8400 | 1 | | 3455 | 5120 | 1 | | 830 | 12537 | 1 | | 4525 | 791 | 1 | | 2303 | 14243 | 1 | +---------+---------+--------+ mysql> CALL sys.ML_TRAIN('mlcorpus.training_table', 'rating', JSON_OBJECT('task', 'recommendation', 'users', 'user_id', 'items', 'item_id', 'feedback', 'implicit'), @model); Query OK, 0 rows affected (2 min 13.6415 sec)
-
An
ML_TRAIN
example that trains a content-based recommendation model by specifying a table with item descriptions (mlcorpus_recsys.`citeulike_items_sample
). The optimization metrichit_ratio_at_k
is used. The model must use implicit feedback.mysql> CALL sys.ML_TRAIN('mlcorpus_recsys.`citeulike_train_sample`', 'rating', JSON_OBJECT('task', 'recommendation', 'model_list', JSON_ARRAY('CTR'), 'users', 'user_id', 'items', 'item_id','feedback', 'implicit', 'optimization_metric', 'hit_ratio_at_k', 'item_metadata', JSON_OBJECT('table_name', 'mlcorpus_recsys.`citeulike_items_sample`')), @model);