Documentation Home
HeatWave User Guide
Related Documentation Download this Manual
PDF (US Ltr) - 2.1Mb
PDF (A4) - 2.1Mb


HeatWave User Guide  /  ...  /  Training a Recommendation Model

3.11.2 Training a Recommendation Model

When training a recommendation model, there are two mandatory options: users and items.

For content-based recommendation models, the item_metadata option is also required, which specifies the table that has item descriptions. This table must only have two columns: one corresponding to the item_id, and the other with a TEXT data type (TINYTEXT, TEXT, MEDIUMTEXT, LONGTEXT) that has the description of the item.

Run the ML_TRAIN routine to create a recommendation model, and use the following JSON options:

  • task: recommendation: Specifies the machine learning task.

  • users: Specifies the column name corresponding to the user ids. Values in this column must be in a STRING data type, otherwise an error will be generated during training.

  • items: Specifies the column name corresponding to the item ids. Values in this column must be in a STRING data type, otherwise an error will be generated during training.

  • feedback: The type of feedback for a recommendation model, explicit, the default, or implicit.

  • feedback_threshold: The feedback threshold for a recommendation model that uses implicit feedback. It represents the threshold required to be considered positive feedback. For example, if numerical data records the number of times users interact with an item, you might set a threshold with a value of 3. This means users would need to interact with an item more than three times to be considered positive feedback.

  • item_metadata: To be used with content-based recommendation models using implicit feedback. It is a JSON object that can have the table_name option as a key, which specifies the table that has item descriptions.

    • table_name: To be used with the item_metadata option. It specifies the table name that has item descriptions for content-based recommendation models. It must be a string in a fully qualified format (schema_name.table_name) that specifies the table name.

If the users or items column contains NULL values, the corresponding rows will be dropped and will not be considered during training.

See Section 3.5, “Training a Model”, and for full details of all the options, see ML_TRAIN.

Syntax Examples for Explicit Feedback

  • An ML_TRAIN example that specifies the recommendation task type.

    mysql> CALL sys.ML_TRAIN('table_train', 'target_column_feature', 
              JSON_OBJECT('task', 'recommendation', 
              'users', 'user_column_feature', 
              'items', 'item_column_feature'), 
              @model);
  • An ML_TRAIN example that specifies the SVD model type.

    mysql> CALL sys.ML_TRAIN('mlcorpus.foursquare_NYC_train', 'rating', 
              JSON_OBJECT('task', 'recommendation', 
              'users', 'user_id', 
              'items', 'item_id', 
              'model_list', JSON_ARRAY('SVD')), 
              @model);
    Query OK, 0 rows affected (11.31 sec)
    
    mysql> SELECT model_type FROM ML_SCHEMA_root.MODEL_CATALOG 
              WHERE model_handle=@model;
    +------------+
    | model_type |
    +------------+
    | SVD        |
    +------------+
    1 row in set (0.00 sec)
  • An ML_TRAIN example that specifies the SVDpp model type.

    mysql> CALL sys.ML_TRAIN('mlcorpus.foursquare_NYC_train', 'rating', 
              JSON_OBJECT('task', 'recommendation', 
              'users', 'user_id', 
              'items', 'item_id', 
              'model_list',  JSON_ARRAY('SVDpp')), 
              @model);
    Query OK, 0 rows affected (13.97 sec)
    
    mysql> SELECT model_type FROM ML_SCHEMA_root.MODEL_CATALOG 
              WHERE model_handle=@model;
    +------------+
    | model_type |
    +------------+
    | SVDpp      |
    +------------+
    1 row in set (0.00 sec)
  • An ML_TRAIN example that specifies the NMF model type.

    mysql> CALL sys.ML_TRAIN('mlcorpus.foursquare_NYC_train', 'rating', 
              JSON_OBJECT('task', 'recommendation', 
              'users', 'user_id', 
              'items', 'item_id', 
              'model_list', JSON_ARRAY('NMF')), 
              @model);
    Query OK, 0 rows affected (12.28 sec)
    
    mysql> SELECT model_type FROM ML_SCHEMA_root.MODEL_CATALOG 
              WHERE model_handle=@model;
    +------------+
    | model_type |
    +------------+
    | NMF        |
    +------------+
    1 row in set (0.00 sec)
  • An ML_TRAIN example that specifies three models for the model_list option.

    mysql> SET @allowed_models = JSON_ARRAY('SVD', 'SVDpp', 'NMF');
    
    mysql> CALL sys.ML_TRAIN('mlcorpus.foursquare_NYC_train', 'rating',
              JSON_OBJECT('task', 'recommendation', 
              'users', 'user_id', 
              'items', 'item_id', 
              'model_list', CAST(@allowed_models AS JSON)), 
              @model);
    Query OK, 0 rows affected (14.88 sec)
    
    mysql> SELECT model_type FROM ML_SCHEMA_root.MODEL_CATALOG 
              WHERE model_handle=@model;
    +------------+
    | model_type |
    +------------+
    | SVD        |
    +------------+
    1 row in set (0.00 sec)
  • An ML_TRAIN example that specifies five models for the exclude_model_list option.

    mysql> SET @exclude_models= JSON_ARRAY('NormalPredictor', 'Baseline', 'SlopeOne', 'CoClustering', 'SVD');
    
    mysql> CALL sys.ML_TRAIN('mlcorpus.foursquare_NYC_train', 'rating', 
              JSON_OBJECT('task', 'recommendation', 
              'users', 'user_id', 
              'items', 'item_id', 
              'exclude_model_list', CAST(@exclude_models AS JSON)), 
              @model);
    Query OK, 0 rows affected (14.71 sec)
    
    mysql> SELECT model_type FROM ML_SCHEMA_root.MODEL_CATALOG 
              WHERE model_handle=@model;
    +------------+
    | model_type |
    +------------+
    | SVDpp      |
    +------------+
    1 row in set (0.00 sec)

Syntax Example for Implicit Feedback

  • An ML_TRAIN example that specifies the recommendation task with implicit feedback.

    mysql> SELECT * FROM training_table LIMIT 10;
    +---------+---------+--------+
    | user_id | item_id | rating |
    +---------+---------+--------+
    | 836     | 226     |      1 |
    | 3951    | 14918   |      1 |
    | 1048    | 2498    |      1 |
    | 4312    | 2559    |      1 |
    | 4882    | 12930   |      1 |
    | 3226    | 8400    |      1 |
    | 3455    | 5120    |      1 |
    | 830     | 12537   |      1 |
    | 4525    | 791     |      1 |
    | 2303    | 14243   |      1 |
    +---------+---------+--------+
    
    mysql> CALL sys.ML_TRAIN('mlcorpus.training_table', 'rating', 
              JSON_OBJECT('task', 'recommendation', 
              'users', 'user_id', 
              'items', 'item_id', 
              'feedback', 'implicit'), 
              @model);
    Query OK, 0 rows affected (2 min 13.6415 sec)

Syntax Example for Content-Based Recommendation Model

  • An ML_TRAIN example that trains a content-based recommendation model by specifying a table with item descriptions (mlcorpus_recsys.`citeulike_items_sample). The optimization metric hit_ratio_at_k is used. The model must use implicit feedback.

    mysql> CALL sys.ML_TRAIN('mlcorpus_recsys.`citeulike_train_sample`', 'rating', 
     JSON_OBJECT('task', 'recommendation', 'model_list', 
    JSON_ARRAY('CTR'), 'users', 'user_id', 'items', 'item_id','feedback', 'implicit', 'optimization_metric', 'hit_ratio_at_k',
    'item_metadata', JSON_OBJECT('table_name', 'mlcorpus_recsys.`citeulike_items_sample`')), 
    @model);