ML_PREDICT_ROW
generates
predictions for one or more rows of unlabeled data specified in
JSON
format. Invoke
ML_PREDICT_ROW
with a
SELECT
statement.
A call to ML_PREDICT_ROW
can
include columns that were not present during
ML_TRAIN
. A table can include
extra columns, and still use the MySQL HeatWave AutoML model. This allows
side by side comparisons of target column labels, ground truth,
and predictions in the same table.
ML_PREDICT_ROW
ignores any extra
columns, and appends them to the results.
ML_PREDICT_ROW
does not support
the following model types:
Forecasting
Anomaly detection for logs
This topic has the following sections.
mysql> SELECT sys.ML_PREDICT_ROW(input_data, model_handle), [options]);
options: {
JSON_OBJECT("key","value"[,"key","value"] ...)
"key","value": {
['threshold', 'N']
['topk', 'N']
['recommend', {'ratings'|'items'|'users'|'users_to_items'|'items_to_users'|'items_to_items'|'users_to_users'}|NULL]
['remove_seen', {'true'|'false'}]
['additional_details', {'true'|'false'}]
}
}
Set the following required parameters:
-
input_data
: Define the data to generate predictions for. The column names must match the feature column names in the table used to train the model. You can define the input data in the following ways:Specify a single row of data in
JSON
format.mysql> SELECT sys.ML_PREDICT_ROW(JSON_OBJECT("column_name", value, "column_name", value, ...), model_handle, options);
Run
ML_PREDICT_ROW
on multiple rows of data by specifying the columns as key-value pairs inJSON
format and select from a table.mysql> SELECT sys.ML_PREDICT_ROW(JSON_OBJECT("output_col_name", schema.`input_col_name`, "output_col_name", schema.`input_col_name`, ...), model_handle, options) FROM input_table_name LIMIT N;
model_handle
: Define the model handle or a session variable that contains the model handle. See Work with Model Handles.
Review the following options in JSON
format.
To view data drift detection values for classification,
regression, and anomaly detection (as of MySQL 9.3.2) models,
set the additional_details
option to
true
. The ml_results
includes the drift
JSON object literal. See
Analyze Data
Drift.
Set the following options as needed for Recommendation models.
topk
: Specify the number of recommendations to provide as a positive integer. The default is3
.-
recommend
: Specify what to recommend.-
ratings
: Use this option to predict ratings. This is the default value.The target column is
prediction
, and the values arefloat
.The input table must contain at least two columns with the same names as the user column and item column from the training model.
-
items
: Use this option to recommend items for users.The target column is
item_recommendation
, and the values are:JSON_OBJECT("column_item_id_name", JSON_ARRAY("item_1", ... , "item_k"), "column_rating_name" , JSON_ARRAY(rating_1, ..., rating_k))
The input table must contain at least one column with the same name as the user column from the training model.
-
users
: Use this option to recommend users for items.The target column is
user_recommendation
, and the values are:JSON_OBJECT("column_user_id_name", JSON_ARRAY("user_1", ... , "user_k"), "column_rating_name" , JSON_ARRAY(rating_1, ..., rating_k))
The input table must contain at least one column with the same name as the item column from the training model.
users_to_items
: This is the same asitems
.items_to_users
: This is the same asusers
.-
items_to_items
: Use this option to recommend similar items for items.The target column is
item_recommendation
, and the values are:JSON_OBJECT("column_item_id_name", JSON_ARRAY("item_1", ... , "item_k"))
The input table must contain at least contain a column with the same name as the item column from the training model.
-
users_to_users
: Use this option to recommend similar users for users.The target column is
user_recommendation
, and the values are:JSON_OBJECT("column_user_id_name", JSON_ARRAY("user_1", ... , "user_k"))
The input table must contain at least one column with the same name as the user column from the training model.
-
remove_seen
: If the input table overlaps with the training table, andremove_seen
istrue
, then the model will not repeat existing interactions. The default istrue
. Setremove_seen
tofalse
to repeat existing interactions from the training table.
Set the following options as needed for anomaly detection models.
threshold
: The threshold you set on anomaly detection models determines which rows in the output table are labeled as anomalies with an anomaly score of1
, or normal with an anomaly score of0
. The value for the threshold is the degree to which a row of data or log segment is considered for anomaly detection. Any sample with an anomaly score above the threshold is classified an anomaly. The default value is (1 -contamination
)-th percentile of all the anomaly scores.
-
The following example generates a prediction on a single row of data. The results include the
ml_results
field, which usesJSON
format. Optionally use\G
to display the information in an easily readable format.mysql> SET @row_input = JSON_OBJECT( "age", 25, "workclass", "Private", "fnlwgt", 226802, "education", "11th", "education-num", 7, "marital-status", "Never-married", "occupation", "Machine-op-inspct", "relationship", "Own-child", "race", "Black", "sex", "Male", "capital-gain", 0, "capital-loss", 0, "hours-per-week", 40, "native-country", "United-States"); mysql> SELECT sys.ML_PREDICT_ROW(@row_input, @census_model, NULL)\G *************************** 1. row *************************** sys.ML_PREDICT_ROW(@row_input, @census_model, NULL): { "age": 25, "sex": "Male", "race": "Black", "fnlwgt": 226802, "education": "11th", "workclass": "Private", "Prediction": "<=50K", "ml_results": { "predictions": { "revenue": "<=50K" }, "probabilities": { ">50K": 0.0032, "<=50K": 0.9968 } }, "occupation": "Machine-op-inspct", "capital-gain": 0, "capital-loss": 0, "relationship": "Own-child", "education-num": 7, "hours-per-week": 40, "marital-status": "Never-married", "native-country": "United-States" } 1 row in set (2.2218 sec)
-
The following example generates predictions on two rows of data from the input table. Optionally use
\G
to display the information in an easily readable format.mysql> SELECT sys.ML_PREDICT_ROW(JSON_OBJECT( "age", census_train.`age`, "workclass", census_train.`workclass`, "fnlwgt", census_train.`fnlwgt`, "education", census_train.`education`, "education-num", census_train.`education-num`, "marital-status", census_train.`marital-status`, "occupation", census_train.`occupation`, "relationship", census_train.`relationship`, "race", census_train.`race`, "sex", census_train.`sex`, "capital-gain", census_train.`capital-gain`, "capital-loss", census_train.`capital-loss`, "hours-per-week", census_train.`hours-per-week`, "native-country", census_train.`native-country`), @census_model, NULL)FROM census_data.census_train LIMIT 2\G *************************** 1. row *************************** sys.ML_PREDICT_ROW(JSON_OBJECT( "age", census_train.`age`, "workclass", census_train.`workclass`, "fnlwgt", census_train.`fnlwgt`, "education", census_train.`education`, "education-num", census_train.`education-num`, "marital-status", census_train.`marita: { "age": 62, "sex": "Female", "race": "White", "fnlwgt": 123582, "education": "10th", "workclass": "Private", "Prediction": "<=50K", "ml_results": { "predictions": { "revenue": "<=50K" }, "probabilities": { ">50K": 0.0106, "<=50K": 0.9894 } }, "occupation": "Other-service", "capital-gain": 0, "capital-loss": 0, "relationship": "Unmarried", "education-num": 6, "hours-per-week": 40, "marital-status": "Divorced", "native-country": "United-States" } *************************** 2. row *************************** sys.ML_PREDICT_ROW(JSON_OBJECT( "age", census_train.`age`, "workclass", census_train.`workclass`, "fnlwgt", census_train.`fnlwgt`, "education", census_train.`education`, "education-num", census_train.`education-num`, "marital-status", census_train.`marita: { "age": 32, "sex": "Female", "race": "White", "fnlwgt": 174215, "education": "Bachelors", "workclass": "Federal-gov", "Prediction": "<=50K", "ml_results": { "predictions": { "revenue": "<=50K" }, "probabilities": { ">50K": 0.3249, "<=50K": 0.6751 } }, "occupation": "Exec-managerial", "capital-gain": 0, "capital-loss": 0, "relationship": "Not-in-family", "education-num": 13, "hours-per-week": 60, "marital-status": "Never-married", "native-country": "United-States" } 2 rows in set (9.6548 sec)
-
The following example uses explicit feedback and runs the
ML_PREDICT_ROW
routine to predict the top 3 items that a particular user will like with theusers_to_items
option.mysql> SELECT sys.ML_PREDICT_ROW('{"user_id": "846"}', @model, JSON_OBJECT("recommend", "users_to_items", "topk", 3)); +----------------------------------------------------------------------------------------------------------------------+ | sys.ML_PREDICT_ROW('{"user_id": "846"}', @model, JSON_OBJECT("recommend", "users_to_items", "topk", 3)) | +----------------------------------------------------------------------------------------------------------------------+ | {"user_id": "846", "ml_results": "{"predictions": {"item_id": ["313", "483", "64"], "rating": [4.06, 4.05, 4.04]}}"} | +----------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.2811 sec)
-
The following example generates predictions on ten rows from an input table. The
additional_details
parameter is set toTRUE, so you can review data drift details.
mysql> SELECT sys.ML_PREDICT_ROW(JSON_OBJECT( "age", census_test.`age`, "workclass", census_test.`workclass`, "fnlwgt", census_test.`fnlwgt`, "education", census_test.`education`, "education-num", census_test.`education-num`, "marital-status", census_test.`marital-status`, "occupation", census_test.`occupation`, "relationship", census_test.`relationship`, "race", census_test.`race`, "sex", census_test.`sex`, "capital-gain", census_test.`capital-gain`, "capital-loss", census_test.`capital-loss`, "hours-per-week", census_test.`hours-per-week`, "native-country", census_test.`native-country`), @census_model, JSON_OBJECT('additional_details', TRUE))FROM census_data.census_test LIMIT 10; +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | sys.ML_PREDICT_ROW(JSON_OBJECT( "age", census_test.`age`, "workclass", census_test.`workclass`, "fnlwgt", census_test.`fnlwgt`, "education", census_test.`education`, "education-num", census_test.`education-num`, "ma | +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | {"age": 37, "sex": "Male", "race": "White", "fnlwgt": 99146, "education": "Bachelors", "workclass": "Private", "Prediction": "<=50K", "ml_results": {"drift": {"metric": 0.0, "attribution_percent": {"age": 0.0, "fnlwgt": 46.67, "capital-gain": 0.0}}, "predictions": {"revenue": "<=50K"}, "probabilities": {">50K": 0.42, "<=50K": 0.58}}, "occupation": "Exec-managerial", "capital-gain": 0, "capital-loss": 1977, "relationship": "Husband", "education-num": 13, "hours-per-week": 50, "marital-status": "Married-civ-spouse", "native-country": "United-States"} | | {"age": 34, "sex": "Male", "race": "White", "fnlwgt": 27409, "education": "9th", "workclass": "Private", "Prediction": "<=50K", "ml_results": {"drift": {"metric": 0.1, "attribution_percent": {"fnlwgt": 25.0, "education": 33.31, "workclass": 16.22}}, "predictions": {"revenue": "<=50K"}, "probabilities": {">50K": 0.24, "<=50K": 0.76}}, "occupation": "Craft-repair", "capital-gain": 0, "capital-loss": 0, "relationship": "Husband", "education-num": 5, "hours-per-week": 50, "marital-status": "Married-civ-spouse", "native-country": "United-States"} | | {"age": 30, "sex": "Female", "race": "White", "fnlwgt": 299507, "education": "Assoc-acdm", "workclass": "Private", "Prediction": "<=50K", "ml_results": {"drift": {"metric": 0.26, "attribution_percent": {"relationship": 21.36, "education-num": 28.33, "hours-per-week": 33.21}}, "predictions": {"revenue": "<=50K"}, "probabilities": {">50K": 0.01, "<=50K": 0.99}}, "occupation": "Other-service", "capital-gain": 0, "capital-loss": 0, "relationship": "Unmarried", "education-num": 12, "hours-per-week": 40, "marital-status": "Separated", "native-country": "United-States"} | | {"age": 62, "sex": "Female", "race": "White", "fnlwgt": 102631, "education": "Some-college", "workclass": "Self-emp-not-inc", "Prediction": "<=50K", "ml_results": {"drift": {"metric": 0.0, "attribution_percent": {"fnlwgt": 5.93, "relationship": 26.58, "hours-per-week": 35.69}}, "predictions": {"revenue": "<=50K"}, "probabilities": {">50K": 0.1, "<=50K": 0.9}}, "occupation": "Farming-fishing", "capital-gain": 0, "capital-loss": 0, "relationship": "Unmarried", "education-num": 10, "hours-per-week": 50, "marital-status": "Widowed", "native-country": "United-States"} | | {"age": 51, "sex": "Male", "race": "White", "fnlwgt": 153486, "education": "Some-college", "workclass": "Private", "Prediction": "<=50K", "ml_results": {"drift": {"metric": 0.0, "attribution_percent": {"sex": 7.84, "workclass": 7.84, "education-num": 83.96}}, "predictions": {"revenue": "<=50K"}, "probabilities": {">50K": 0.3, "<=50K": 0.7}}, "occupation": "Handlers-cleaners", "capital-gain": 0, "capital-loss": 0, "relationship": "Husband", "education-num": 10, "hours-per-week": 40, "marital-status": "Married-civ-spouse", "native-country": "United-States"} | | {"age": 34, "sex": "Male", "race": "Black", "fnlwgt": 434292, "education": "HS-grad", "workclass": "Private", "Prediction": "<=50K", "ml_results": {"drift": {"metric": 2.46, "attribution_percent": {"education": 12.36, "relationship": 22.07, "education-num": 19.92}}, "predictions": {"revenue": "<=50K"}, "probabilities": {">50K": 0.33, "<=50K": 0.67}}, "occupation": "Other-service", "capital-gain": 0, "capital-loss": 0, "relationship": "Husband", "education-num": 9, "hours-per-week": 30, "marital-status": "Married-civ-spouse", "native-country": "United-States"} | | {"age": 28, "sex": "Male", "race": "White", "fnlwgt": 240172, "education": "Masters", "workclass": "Self-emp-not-inc", "Prediction": "<=50K", "ml_results": {"drift": {"metric": 0.23, "attribution_percent": {"sex": 17.41, "fnlwgt": 21.67, "workclass": 17.41}}, "predictions": {"revenue": "<=50K"}, "probabilities": {">50K": 0.24, "<=50K": 0.76}}, "occupation": "Prof-specialty", "capital-gain": 0, "capital-loss": 0, "relationship": "Own-child", "education-num": 14, "hours-per-week": 40, "marital-status": "Never-married", "native-country": "United-States"} | | {"age": 56, "sex": "Male", "race": "White", "fnlwgt": 219426, "education": "10th", "workclass": "Private", "Prediction": "<=50K", "ml_results": {"drift": {"metric": 0.06, "attribution_percent": {"age": 27.74, "race": 22.22, "education-num": 25.1}}, "predictions": {"revenue": "<=50K"}, "probabilities": {">50K": 0.13, "<=50K": 0.87}}, "occupation": "Handlers-cleaners", "capital-gain": 0, "capital-loss": 0, "relationship": "Not-in-family", "education-num": 6, "hours-per-week": 40, "marital-status": "Never-married", "native-country": "United-States"} | | {"age": 46, "sex": "Female", "race": "White", "fnlwgt": 295791, "education": "HS-grad", "workclass": "Private", "Prediction": "<=50K", "ml_results": {"drift": {"metric": 0.0, "attribution_percent": {"race": 9.66, "capital-gain": 41.59, "marital-status": 38.47}}, "predictions": {"revenue": "<=50K"}, "probabilities": {">50K": 0.17, "<=50K": 0.83}}, "occupation": "Tech-support", "capital-gain": 0, "capital-loss": 0, "relationship": "Not-in-family", "education-num": 9, "hours-per-week": 30, "marital-status": "Divorced", "native-country": "United-States"} | | {"age": 46, "sex": "Male", "race": "White", "fnlwgt": 114032, "education": "Some-college", "workclass": "Private", "Prediction": "<=50K", "ml_results": {"drift": {"metric": 0.0, "attribution_percent": {"age": 0.0, "capital-gain": 0.0, "education-num": 100.0}}, "predictions": {"revenue": "<=50K"}, "probabilities": {">50K": 0.34, "<=50K": 0.66}}, "occupation": "Tech-support", "capital-gain": 0, "capital-loss": 1887, "relationship": "Husband", "education-num": 10, "hours-per-week": 45, "marital-status": "Married-civ-spouse", "native-country": "United-States"} | +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 10 rows in set (6.8109 sec)
-
The following example uses a recommendation model with implicit feedback to predict the ranking for a particular user and item.
mysql> SELECT sys.ML_PREDICT_ROW('{"user_id": "836", "item_id": "226"}', @model, NULL); +---------------------------------------------------------------------------------------+ | sys.ML_PREDICT_ROW('{"user_id": "836", "item_id": "226"}', @model, NULL) | +---------------------------------------------------------------------------------------+ | {"item_id": "226", "user_id": "836", "ml_results": {"predictions": {"rating": 2.46}}} | +---------------------------------------------------------------------------------------+ 1 row in set (0.1390 sec)
-
The following example uses a recommendation model with explicit feedback to predict the top two items that a particular user will like with the
users_to_items
option.mysql> SELECT sys.ML_PREDICT_ROW('{"user_id": "846"}', @model, JSON_OBJECT("recommend", "users_to_items", "topk", 2)); +----------------------------------------------------------------------------------------------------------------------+ | sys.ML_PREDICT_ROW('{"user_id": "846"}', @model, JSON_OBJECT("recommend", "users_to_items", "topk", 2)) | +----------------------------------------------------------------------------------------------------------------------+ | {"user_id": "846", "ml_results": "{"predictions": {"item_id": ["313", "483"], "rating": [4.06, 4.05]}}"} | +----------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.2811 sec)
-
The following example uses a recommendation model with explicit feedback to predict the top two items similar to another item.
mysql> SELECT sys.ML_PREDICT_ROW('{"item_id": "524"}', @model, JSON_OBJECT("recommend", "items_to_items", "topk", 2)); +------------------------------------------------------------------------------------------------------------------------+ | sys.ML_PREDICT_ROW('{"item_id": "524"}', @model, JSON_OBJECT("recommend", "items_to_items", "topk", 2)) | +------------------------------------------------------------------------------------------------------------------------+ | {"item_id": "524", "ml_results": "{"predictions": {"item_id": ["665", "633"], "similarity": [1.0, 1.0]}}"} | +------------------------------------------------------------------------------------------------------------------------+