Related Documentation Download this Manual
PDF (US Ltr) - 1.6Mb
PDF (A4) - 1.6Mb


MySQL HeatWave User Guide  /  ...  /  ML_PREDICT_TABLE

3.15.5 ML_PREDICT_TABLE

ML_PREDICT_TABLE generates predictions for an entire table of unlabeled data and saves the results to an output table. HeatWave AutoML performs the predictions in parallel.

ML_PREDICT_TABLE is a compute intensive process. Limiting operations to batches of 10 to 100 rows by splitting large tables into smaller tables is recommended. MySQL 8.2.0 adds batch processing with the batch_size option.

A loaded model is required to run ML_PREDICT_TABLE. See Section 3.13.3, “Loading Models”.

As of MySQL 8.0.32, the returned table includes a primary key:

  • If the input table has a primary key, the output table will have the same primary key.

  • If the input table does not have a primary key, the output table will have a new primary key named _id that auto increments.

The input table must not have a column with the name _id that is not a primary key.

As of MySQL 8.0.32, the returned table also includes the ml_results column which contains the prediction results and the data. MySQL 8.1.0 includes support for text data types. The combination of results and data must be less than 65,532 characters.

MySQL 8.2.0 adds recommendation models that use implicit feedback to learn and recommend rankings for users and items. MySQL 8.2.0 also adds batch processing with the batch_size option.

ML_PREDICT_TABLE Syntax

MySQL 8.2.0 adds more options to support the recommendation task and batch processing.

mysql> CALL sys.ML_PREDICT_TABLE(table_name, model_handle, output_table_name), [options]);
 
options: {
    JSON_OBJECT('key','value'[,'key','value'] ...)
        'key','value':
        |'threshold', 'N'
        |'topk', 'N'
        |'recommend', {'ratings'|'items'|'users'|'users_to_items'|'items_to_users'|'items_to_items'|'users_to_users'}|NULL
        |'remove_seen', {'true'|'false'}
        |'batch_size', 'N'
}

MySQL 8.1.0 adds more options to support the recommendation task.

mysql> CALL sys.ML_PREDICT_TABLE(table_name, model_handle, output_table_name), [options]);
 
options: {
    JSON_OBJECT('key','value'[,'key','value'] ...)
        'key','value':
        |'threshold', 'N'
        |'topk', 'N'
        |'recommend', {'ratings'|'items'|'users'|'users_to_items'|'items_to_users'|'items_to_items'|'users_to_users'}|NULL
}

MySQL 8.0.33 added options that support the recommendation task.

mysql> CALL sys.ML_PREDICT_TABLE(table_name, model_handle, output_table_name), [options]);
 
options: {
    JSON_OBJECT('key','value'[,'key','value'] ...)
        'key','value':
        |'threshold', 'N'
        |'topk', 'N'
        |'recommend', {'ratings'|'items'|'users'}|NULL
}

MySQL 8.0.32 added an options parameter in JSON format that supports the anomaly_detection task. For all other tasks, set this parameter to NULL.

MySQL 8.0.32 allows a call to ML_PREDICT_TABLE to include columns that were not present during ML_TRAIN. A table can include extra columns, and still use the HeatWave AutoML model. This allows side by side comparisons of target column labels, ground truth, and predictions in the same table. ML_PREDICT_TABLE ignores any extra columns, and appends them to the results.

mysql> CALL sys.ML_PREDICT_TABLE(table_name, model_handle, output_table_name), [options]);
 
options: {
    JSON_OBJECT('key','value'[,'key','value'] ...)
        'key','value':
        |'threshold', 'N'
        |'topk', 'N'
}

Before MySQL 8.0.32:

mysql> CALL sys.ML_PREDICT_TABLE(table_name, model_handle, output_table_name);

ML_PREDICT_TABLE parameters:

  • table_name: Specifies the fully qualified name of the input table (schema_name.table_name). The input table should contain the same feature columns as the training dataset but no target column.

  • model_handle: Specifies the model handle or a session variable containing the model handle

  • output_table_name: Specifies the table where predictions are stored. The table is created if it does not exist. A fully qualified table name must be specified (schema_name.table_name). If the table already exists, an error is returned.

  • options: A set of options in JSON format.

    As of MySQL 8.0.33, this parameter supports the recommendation and anomaly_detection tasks.

    As of MySQL 8.0.32, this parameter only supports the anomaly_detection task.

    For all other tasks, set this parameter to NULL. Before MySQL 8.0.32, ignore this parameter.

    • threshold: The optional threshold for use with the anomaly_detection task to convert anomaly scores to 1: an anomaly or 0: normal. 0 < threshold < 1. The default value is (1 - contamination)-th percentile of all the anomaly scores.

    • topk: The optional top K rows for use with the anomaly_detection and recommendation tasks. A positive integer between 1 and the table length.

      For the anomaly_detection task, the results include the top K rows with the highest anomaly scores. If topk is not set, ML_PREDICT_TABLE uses threshold.

      For an anomaly_detection task, do not set both threshold and topk. Use threshold or topk, or set options to NULL.

      For the recommendation task, the number of recommendations to provide. The default is 3.

      A recommendation task with implicit feedback can use both threshold and topk.

    • recommend: Use with the recommendation task to specify what to recommend. Permitted values are:

      • ratings: Use this option to predict ratings. This is the default value.

        The target column is prediction, and the values are float.

        The input table must contain at least two columns with the same names as the user column and item column from the training model.

      • items: Use this option to recommend items for users.

        The target column is item_recommendation, and the values are:

        JSON_OBJECT("column_item_id_name", JSON_ARRAY("item_1", ... , "item_k"), 
                    "column_rating_name" , JSON_ARRAY(rating_1, ..., rating_k))

        The input table must at least contain a column with the same name as the user column from the training model.

      • users: Use this option to recommend users for items.

        The target column is user_recommendation, and the values are:

        JSON_OBJECT("column_user_id_name", JSON_ARRAY("user_1", ... , "user_k"), 
                    "column_rating_name" , JSON_ARRAY(rating_1, ..., rating_k))

        The input table must at least contain a column with the same name as the item column from the training model.

      • users_to_items: This is the same as items.

      • items_to_users: This is the same as users.

      • items_to_items: Use this option to recommend similar items for items.

        The target column is item_recommendation, and the values are:

        JSON_OBJECT("column_item_id_name", JSON_ARRAY("item_1", ... , "item_k"))

        The input table must at least contain a column with the same name as the item column from the training model.

      • users_to_users: Use this option to recommend similar users for users.

        The target column is user_recommendation, and the values are:

        JSON_OBJECT("column_user_id_name", JSON_ARRAY("user_1", ... , "user_k"))

        The input table must at least contain a column with the same name as the user column from the training model.

    • remove_seen: If the input table overlaps with the training table, and remove_seen is true, then the model will not repeat existing interactions. The default is true. Set remove_seen to false to repeat existing interactions from the training table.

    • batch_size: The size of each batch. 1 ≤ batch_size ≤ 1,000. The default is 1,000, and this provides the best results.

Syntax Examples

  • A typical usage example that specifies the fully qualified name of the table to generate predictions for, the session variable containing the model handle, and the fully qualified output table name:

    mysql> CALL sys.ML_PREDICT_TABLE('ml_data.iris_test', @iris_model, 
              'ml_data.iris_predictions', NULL);

    To view ML_PREDICT_TABLE results, query the output table. The table shows the predictions and the feature column values used to make each prediction. The table includes the primary key, _id, and the ml_results column, which uses JSON format:

    mysql> SELECT * from ml_data.iris_predictions LIMIT 5;
    +-----+--------------+-------------+--------------+-------------+-----------------+-----------------+---------------------------------------------------------------------------------------------------------------------------------------+
    | _id | sepal length | sepal width | petal length | petal width | class           | Prediction      | ml_results                                                                                                                            |
    +-----+--------------+-------------+--------------+-------------+-----------------+-----------------+---------------------------------------------------------------------------------------------------------------------------------------+
    |   1 |          7.3 |         2.9 |          6.3 |         1.8 | Iris-virginica  | Iris-virginica  | {'predictions': {'class': 'Iris-virginica'}, 'probabilities': {'Iris-setosa': 0.0, 'Iris-versicolor': 0.13, 'Iris-virginica': 0.87}}  |
    |   2 |          6.1 |         2.9 |          4.7 |         1.4 | Iris-versicolor | Iris-versicolor | {'predictions': {'class': 'Iris-versicolor'}, 'probabilities': {'Iris-setosa': 0.0, 'Iris-versicolor': 1.0, 'Iris-virginica': 0.0}}   |
    |   3 |          6.3 |         2.8 |          5.1 |         1.5 | Iris-virginica  | Iris-versicolor | {'predictions': {'class': 'Iris-versicolor'}, 'probabilities': {'Iris-setosa': 0.0, 'Iris-versicolor': 0.6, 'Iris-virginica': 0.4}}   |
    |   4 |          6.3 |         3.3 |          4.7 |         1.6 | Iris-versicolor | Iris-versicolor | {'predictions': {'class': 'Iris-versicolor'}, 'probabilities': {'Iris-setosa': 0.0, 'Iris-versicolor': 0.99, 'Iris-virginica': 0.01}} |
    |   5 |          6.1 |           3 |          4.9 |         1.8 | Iris-virginica  | Iris-virginica  | {'predictions': {'class': 'Iris-virginica'}, 'probabilities': {'Iris-setosa': 0.0, 'Iris-versicolor': 0.32, 'Iris-virginica': 0.68}}  |
    +-----+--------------+-------------+--------------+-------------+-----------------+-----------------+---------------------------------------------------------------------------------------------------------------------------------------+
    5 rows in set (0.00 sec)

    Before MySQL 8.0.32, the ML_PREDICT_TABLE routine does not include options, and the results do not include the ml_results column:

    mysql> CALL sys.ML_PREDICT_TABLE('ml_data.iris_test', @iris_model, 
              'ml_data.iris_predictions');
    
    mysql> SELECT * FROM ml_data.iris_predictions LIMIT 3;
    *************************** 1. row ***************************
    sepal length: 7.3
     sepal width: 2.9
    petal length: 6.3
     petal width: 1.8
      Prediction: Iris-virginica
    *************************** 2. row ***************************
    sepal length: 6.1
     sepal width: 2.9
    petal length: 4.7
     petal width: 1.4
      Prediction: Iris-versicolor
    *************************** 3. row ***************************
    sepal length: 6.3
     sepal width: 2.8
    petal length: 5.1
     petal width: 1.5
      Prediction: Iris-virginica

See also: