Documentation Home
HeatWave User Guide
Related Documentation Download this Manual
PDF (US Ltr) - 2.1Mb
PDF (A4) - 2.1Mb


HeatWave User Guide  /  ...  /  ML_PREDICT_ROW

3.16.5 ML_PREDICT_ROW

ML_PREDICT_ROW generates predictions for one or more rows of unlabeled data specified in JSON format. Invoke ML_PREDICT_ROW with a SELECT statement.

ML_PREDICT_ROW requires a loaded model to run. See Section 3.14.3, “Loading Models”.

ML_PREDICT_ROW supports data drift detection for classification and regression models with the following:

  • The options parameter includes the additional_details boolean value.

  • The ml_results column includes the drift JSON object literal.

See: Section 3.14.11, “Data Drift Detection”.

ML_PREDICT_ROW Syntax

mysql> SELECT sys.ML_PREDICT_ROW(input_data, model_handle), [options]);
 
options: {
     JSON_OBJECT("key","value"[,"key","value"] ...)
          "key","value": {
          ['threshold', 'N']
          ['topk', 'N']
          ['recommend', {'ratings'|'items'|'users'|'users_to_items'|'items_to_users'|'items_to_items'|'users_to_users'}|NULL]
          ['remove_seen', {'true'|'false'}]
          ['additional_details', {'true'|'false'}]
          }
}

A call to ML_PREDICT_ROW can include columns that were not present during ML_TRAIN. A table can include extra columns, and still use the HeatWave AutoML model. This allows side by side comparisons of target column labels, ground truth, and predictions in the same table. ML_PREDICT_ROW ignores any extra columns, and appends them to the results.

ML_PREDICT_ROW parameters:

  • input_data: Specifies the data to generate predictions for.

    Specify a single row of data in JSON format:

    mysql> SELECT sys.ML_PREDICT_ROW(JSON_OBJECT("column_name", value, 
              "column_name", value, ...), 
              model_handle, options);

    To run ML_PREDICT_ROW on multiple rows of data, specify the columns as key-value pairs in JSON format and select from a table:

    mysql> SELECT sys.ML_PREDICT_ROW(JSON_OBJECT("output_col_name", schema.`input_col_name`, 
              "output_col_name", schema.`input_col_name`, ...), 
              model_handle, options) 
              FROM input_table_name LIMIT N;
  • model_handle: Specifies the model handle or a session variable that contains the model handle.

  • options: A set of options in JSON format.

    This parameter only supports the recommendation and anomaly_detection tasks. For all other tasks, set this parameter to NULL.

    • threshold: The optional threshold for use with the anomaly_detection task to convert anomaly scores to 1: an anomaly or 0: normal. 0 < threshold < 1. The default value is (1 - contamination)-th percentile of all the anomaly scores.

    • topk: Use with the recommendation task to specify the number of recommendations to provide. A positive integer. The default is 3.

    • recommend: Use with the recommendation task to specify what to recommend. Permitted values are:

      • ratings: Use this option to predict ratings. This is the default value.

        The target column is prediction, and the values are float.

        The input table must contain at least two columns with the same names as the user column and item column from the training model.

      • items: Use this option to recommend items for users.

        The target column is item_recommendation, and the values are:

        JSON_OBJECT("column_item_id_name", JSON_ARRAY("item_1", ... , "item_k"), 
                    "column_rating_name" , JSON_ARRAY(rating_1, ..., rating_k))

        The input table must at least contain a column with the same name as the user column from the training model.

      • users: Use this option to recommend users for items.

        The target column is user_recommendation, and the values are:

        JSON_OBJECT("column_user_id_name", JSON_ARRAY("user_1", ... , "user_k"), 
                    "column_rating_name" , JSON_ARRAY(rating_1, ..., rating_k))

        The input table must at least contain a column with the same name as the item column from the training model.

      • users_to_items: This is the same as items.

      • items_to_users: This is the same as users.

      • items_to_items: Use this option to recommend similar items for items.

        The target column is item_recommendation, and the values are:

        JSON_OBJECT("column_item_id_name", JSON_ARRAY("item_1", ... , "item_k"))

        The input table must at least contain a column with the same name as the item column from the training model.

      • users_to_users: Use this option to recommend similar users for users.

        The target column is user_recommendation, and the values are:

        JSON_OBJECT("column_user_id_name", JSON_ARRAY("user_1", ... , "user_k"))

        The input table must at least contain a column with the same name as the user column from the training model.

    • remove_seen: If the input table overlaps with the training table, and remove_seen is true, then the model will not repeat existing interactions. The default is true. Set remove_seen to false to repeat existing interactions from the training table.

    • additional_details: Set to true for ml_results to include the JSON object literal, drift.

Syntax Examples

  • To run ML_PREDICT_ROW on a single row of data use a select statement. The results include the ml_results field, which uses JSON format:

    mysql> SELECT sys.ML_PREDICT_ROW(JSON_OBJECT("sepal length", 7.3, "sepal width", 2.9, 
              "petal length", 6.3, "petal width", 1.8), @iris_model, NULL);
    +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | sys.ML_PREDICT_ROW('{"sepal length": 7.3, "sepal width": 2.9, "petal length": 6.3, "petal width": 1.8}', @iris_model, NULL)                                                                                                                                              |
    +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | {"Prediction": "Iris-virginica", "ml_results": "{'predictions': {'class': 'Iris-virginica'}, 'probabilities': {'Iris-setosa': 0.0, 'Iris-versicolor': 0.13, 'Iris-virginica': 0.87}}", "petal width": 1.8, "sepal width": 2.9, "petal length": 6.3, "sepal length": 7.3} |
    +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    1 row in set (1.12 sec)
  • To run ML_PREDICT_ROW on five rows of data selected from an input table:

    mysql> SELECT sys.ML_PREDICT_ROW(JSON_OBJECT("sepal length", iris_test.`sepal length`, 
              "sepal width", iris_test.`sepal width`, "petal length", iris_test.`petal length`, 
              "petal width", iris_test.`petal width`), @iris_model, NULL) 
              FROM ml_data.iris_test LIMIT 5;

See also: