ML_PREDICT_ROW generates
predictions for one or more rows of unlabeled data specified in
JSON format. Invoke
ML_PREDICT_ROW with a
SELECT statement.
A call to ML_PREDICT_ROW can
include columns that were not present during
ML_TRAIN. A table can include
extra columns, and still use the MySQL HeatWave AutoML model. This allows
side by side comparisons of target column labels, ground truth,
and predictions in the same table.
ML_PREDICT_ROW ignores any extra
columns, and appends them to the results.
ML_PREDICT_ROW does not support
the following model types:
Forecasting
Anomaly detection for logs
Recommendation models trained with the
TwoTowermdoel.
This topic has the following sections.
mysql> SELECT sys.ML_PREDICT_ROW(input_data, model_handle), [options]);
options: {
JSON_OBJECT("key","value"[,"key","value"] ...)
"key","value": {
['threshold', 'N']
['topk', 'N']
['recommend', {'ratings'|'items'|'users'|'users_to_items'|'items_to_users'|'items_to_items'|'users_to_users'}|NULL]
['remove_seen', {'true'|'false'}]
['additional_details', {'true'|'false'}]
}
}
Set the following required parameters:
-
input_data: Define the data to generate predictions for. The column names must match the feature column names in the table used to train the model. You can define the input data in the following ways:Specify a single row of data in
JSONformat.mysql> SELECT sys.ML_PREDICT_ROW(JSON_OBJECT("column_name", value, "column_name", value, ...), model_handle, options);Run
ML_PREDICT_ROWon multiple rows of data by specifying the columns as key-value pairs inJSONformat and select from a table.mysql> SELECT sys.ML_PREDICT_ROW(JSON_OBJECT("output_col_name", schema.`input_col_name`, "output_col_name", schema.`input_col_name`, ...), model_handle, options) FROM input_table_name LIMIT N; model_handle: Define the model handle or a session variable that contains the model handle. See Work with Model Handles.
Review the following options in JSON
format.
To view data drift detection values for classification,
regression, and anomaly detection (as of MySQL 9.3.2) models,
set the additional_details option to
true. The ml_results
includes the drift JSON object literal. See
Analyze Data
Drift.
Set the following options as needed for Recommendation models.
topk: Specify the number of recommendations to provide as a positive integer. The default is3.-
recommend: Specify what to recommend.-
ratings: Use this option to predict ratings. This is the default value.The target column is
prediction, and the values arefloat.The input table must contain at least two columns with the same names as the user column and item column from the training model.
-
items: Use this option to recommend items for users.The target column is
item_recommendation, and the values are:JSON_OBJECT("column_item_id_name", JSON_ARRAY("item_1", ... , "item_k"), "column_rating_name" , JSON_ARRAY(rating_1, ..., rating_k))The input table must contain at least one column with the same name as the user column from the training model.
-
users: Use this option to recommend users for items.The target column is
user_recommendation, and the values are:JSON_OBJECT("column_user_id_name", JSON_ARRAY("user_1", ... , "user_k"), "column_rating_name" , JSON_ARRAY(rating_1, ..., rating_k))The input table must contain at least one column with the same name as the item column from the training model.
users_to_items: This is the same asitems.items_to_users: This is the same asusers.-
items_to_items: Use this option to recommend similar items for items.The target column is
item_recommendation, and the values are:JSON_OBJECT("column_item_id_name", JSON_ARRAY("item_1", ... , "item_k"))The input table must contain at least contain a column with the same name as the item column from the training model.
-
users_to_users: Use this option to recommend similar users for users.The target column is
user_recommendation, and the values are:JSON_OBJECT("column_user_id_name", JSON_ARRAY("user_1", ... , "user_k"))The input table must contain at least one column with the same name as the user column from the training model.
-
remove_seen: If the input table overlaps with the training table, andremove_seenistrue, then the model will not repeat existing interactions. The default istrue. Setremove_seentofalseto repeat existing interactions from the training table.
Set the following options as needed for anomaly detection models.
threshold: The threshold you set on anomaly detection models determines which rows in the output table are labeled as anomalies with an anomaly score of1, or normal with an anomaly score of0. The value for the threshold is the degree to which a row of data or log segment is considered for anomaly detection. Any sample with an anomaly score above the threshold is classified an anomaly. The default value is (1 -contamination)-th percentile of all the anomaly scores.
-
The following example generates a prediction on a single row of data. The results include the
ml_resultsfield, which usesJSONformat. Optionally use\Gto display the information in an easily readable format.mysql> SET @row_input = JSON_OBJECT( "age", 25, "workclass", "Private", "fnlwgt", 226802, "education", "11th", "education-num", 7, "marital-status", "Never-married", "occupation", "Machine-op-inspct", "relationship", "Own-child", "race", "Black", "sex", "Male", "capital-gain", 0, "capital-loss", 0, "hours-per-week", 40, "native-country", "United-States"); mysql> SELECT sys.ML_PREDICT_ROW(@row_input, @census_model, NULL)\G *************************** 1. row *************************** sys.ML_PREDICT_ROW(@row_input, @census_model, NULL): { "age": 25, "sex": "Male", "race": "Black", "fnlwgt": 226802, "education": "11th", "workclass": "Private", "Prediction": "<=50K", "ml_results": { "predictions": { "revenue": "<=50K" }, "probabilities": { ">50K": 0.0032, "<=50K": 0.9968 } }, "occupation": "Machine-op-inspct", "capital-gain": 0, "capital-loss": 0, "relationship": "Own-child", "education-num": 7, "hours-per-week": 40, "marital-status": "Never-married", "native-country": "United-States" } 1 row in set (2.2218 sec) -
The following example generates predictions on two rows of data from the input table. Optionally use
\Gto display the information in an easily readable format.mysql> SELECT sys.ML_PREDICT_ROW(JSON_OBJECT( "age", census_train.`age`, "workclass", census_train.`workclass`, "fnlwgt", census_train.`fnlwgt`, "education", census_train.`education`, "education-num", census_train.`education-num`, "marital-status", census_train.`marital-status`, "occupation", census_train.`occupation`, "relationship", census_train.`relationship`, "race", census_train.`race`, "sex", census_train.`sex`, "capital-gain", census_train.`capital-gain`, "capital-loss", census_train.`capital-loss`, "hours-per-week", census_train.`hours-per-week`, "native-country", census_train.`native-country`), @census_model, NULL)FROM census_data.census_train LIMIT 2\G *************************** 1. row *************************** sys.ML_PREDICT_ROW(JSON_OBJECT( "age", census_train.`age`, "workclass", census_train.`workclass`, "fnlwgt", census_train.`fnlwgt`, "education", census_train.`education`, "education-num", census_train.`education-num`, "marital-status", census_train.`marita: { "age": 62, "sex": "Female", "race": "White", "fnlwgt": 123582, "education": "10th", "workclass": "Private", "Prediction": "<=50K", "ml_results": { "predictions": { "revenue": "<=50K" }, "probabilities": { ">50K": 0.0106, "<=50K": 0.9894 } }, "occupation": "Other-service", "capital-gain": 0, "capital-loss": 0, "relationship": "Unmarried", "education-num": 6, "hours-per-week": 40, "marital-status": "Divorced", "native-country": "United-States" } *************************** 2. row *************************** sys.ML_PREDICT_ROW(JSON_OBJECT( "age", census_train.`age`, "workclass", census_train.`workclass`, "fnlwgt", census_train.`fnlwgt`, "education", census_train.`education`, "education-num", census_train.`education-num`, "marital-status", census_train.`marita: { "age": 32, "sex": "Female", "race": "White", "fnlwgt": 174215, "education": "Bachelors", "workclass": "Federal-gov", "Prediction": "<=50K", "ml_results": { "predictions": { "revenue": "<=50K" }, "probabilities": { ">50K": 0.3249, "<=50K": 0.6751 } }, "occupation": "Exec-managerial", "capital-gain": 0, "capital-loss": 0, "relationship": "Not-in-family", "education-num": 13, "hours-per-week": 60, "marital-status": "Never-married", "native-country": "United-States" } 2 rows in set (9.6548 sec) -
The following example uses explicit feedback and runs the
ML_PREDICT_ROWroutine to predict the top 3 items that a particular user will like with theusers_to_itemsoption.mysql> SELECT sys.ML_PREDICT_ROW('{"user_id": "846"}', @model, JSON_OBJECT("recommend", "users_to_items", "topk", 3)); +----------------------------------------------------------------------------------------------------------------------+ | sys.ML_PREDICT_ROW('{"user_id": "846"}', @model, JSON_OBJECT("recommend", "users_to_items", "topk", 3)) | +----------------------------------------------------------------------------------------------------------------------+ | {"user_id": "846", "ml_results": "{"predictions": {"item_id": ["313", "483", "64"], "rating": [4.06, 4.05, 4.04]}}"} | +----------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.2811 sec) -
The following example generates predictions on ten rows from an input table. The
additional_detailsparameter is set toTRUE, so you can review data drift details.mysql> SELECT sys.ML_PREDICT_ROW(JSON_OBJECT( "age", census_test.`age`, "workclass", census_test.`workclass`, "fnlwgt", census_test.`fnlwgt`, "education", census_test.`education`, "education-num", census_test.`education-num`, "marital-status", census_test.`marital-status`, "occupation", census_test.`occupation`, "relationship", census_test.`relationship`, "race", census_test.`race`, "sex", census_test.`sex`, "capital-gain", census_test.`capital-gain`, "capital-loss", census_test.`capital-loss`, "hours-per-week", census_test.`hours-per-week`, "native-country", census_test.`native-country`), @census_model, JSON_OBJECT('additional_details', TRUE))FROM census_data.census_test LIMIT 10; +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | sys.ML_PREDICT_ROW(JSON_OBJECT( "age", census_test.`age`, "workclass", census_test.`workclass`, "fnlwgt", census_test.`fnlwgt`, "education", census_test.`education`, "education-num", census_test.`education-num`, "ma | +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | {"age": 37, "sex": "Male", "race": "White", "fnlwgt": 99146, "education": "Bachelors", "workclass": "Private", "Prediction": "<=50K", "ml_results": {"drift": {"metric": 0.0, "attribution_percent": {"age": 0.0, "fnlwgt": 46.67, "capital-gain": 0.0}}, "predictions": {"revenue": "<=50K"}, "probabilities": {">50K": 0.42, "<=50K": 0.58}}, "occupation": "Exec-managerial", "capital-gain": 0, "capital-loss": 1977, "relationship": "Husband", "education-num": 13, "hours-per-week": 50, "marital-status": "Married-civ-spouse", "native-country": "United-States"} | | {"age": 34, "sex": "Male", "race": "White", "fnlwgt": 27409, "education": "9th", "workclass": "Private", "Prediction": "<=50K", "ml_results": {"drift": {"metric": 0.1, "attribution_percent": {"fnlwgt": 25.0, "education": 33.31, "workclass": 16.22}}, "predictions": {"revenue": "<=50K"}, "probabilities": {">50K": 0.24, "<=50K": 0.76}}, "occupation": "Craft-repair", "capital-gain": 0, "capital-loss": 0, "relationship": "Husband", "education-num": 5, "hours-per-week": 50, "marital-status": "Married-civ-spouse", "native-country": "United-States"} | | {"age": 30, "sex": "Female", "race": "White", "fnlwgt": 299507, "education": "Assoc-acdm", "workclass": "Private", "Prediction": "<=50K", "ml_results": {"drift": {"metric": 0.26, "attribution_percent": {"relationship": 21.36, "education-num": 28.33, "hours-per-week": 33.21}}, "predictions": {"revenue": "<=50K"}, "probabilities": {">50K": 0.01, "<=50K": 0.99}}, "occupation": "Other-service", "capital-gain": 0, "capital-loss": 0, "relationship": "Unmarried", "education-num": 12, "hours-per-week": 40, "marital-status": "Separated", "native-country": "United-States"} | | {"age": 62, "sex": "Female", "race": "White", "fnlwgt": 102631, "education": "Some-college", "workclass": "Self-emp-not-inc", "Prediction": "<=50K", "ml_results": {"drift": {"metric": 0.0, "attribution_percent": {"fnlwgt": 5.93, "relationship": 26.58, "hours-per-week": 35.69}}, "predictions": {"revenue": "<=50K"}, "probabilities": {">50K": 0.1, "<=50K": 0.9}}, "occupation": "Farming-fishing", "capital-gain": 0, "capital-loss": 0, "relationship": "Unmarried", "education-num": 10, "hours-per-week": 50, "marital-status": "Widowed", "native-country": "United-States"} | | {"age": 51, "sex": "Male", "race": "White", "fnlwgt": 153486, "education": "Some-college", "workclass": "Private", "Prediction": "<=50K", "ml_results": {"drift": {"metric": 0.0, "attribution_percent": {"sex": 7.84, "workclass": 7.84, "education-num": 83.96}}, "predictions": {"revenue": "<=50K"}, "probabilities": {">50K": 0.3, "<=50K": 0.7}}, "occupation": "Handlers-cleaners", "capital-gain": 0, "capital-loss": 0, "relationship": "Husband", "education-num": 10, "hours-per-week": 40, "marital-status": "Married-civ-spouse", "native-country": "United-States"} | | {"age": 34, "sex": "Male", "race": "Black", "fnlwgt": 434292, "education": "HS-grad", "workclass": "Private", "Prediction": "<=50K", "ml_results": {"drift": {"metric": 2.46, "attribution_percent": {"education": 12.36, "relationship": 22.07, "education-num": 19.92}}, "predictions": {"revenue": "<=50K"}, "probabilities": {">50K": 0.33, "<=50K": 0.67}}, "occupation": "Other-service", "capital-gain": 0, "capital-loss": 0, "relationship": "Husband", "education-num": 9, "hours-per-week": 30, "marital-status": "Married-civ-spouse", "native-country": "United-States"} | | {"age": 28, "sex": "Male", "race": "White", "fnlwgt": 240172, "education": "Masters", "workclass": "Self-emp-not-inc", "Prediction": "<=50K", "ml_results": {"drift": {"metric": 0.23, "attribution_percent": {"sex": 17.41, "fnlwgt": 21.67, "workclass": 17.41}}, "predictions": {"revenue": "<=50K"}, "probabilities": {">50K": 0.24, "<=50K": 0.76}}, "occupation": "Prof-specialty", "capital-gain": 0, "capital-loss": 0, "relationship": "Own-child", "education-num": 14, "hours-per-week": 40, "marital-status": "Never-married", "native-country": "United-States"} | | {"age": 56, "sex": "Male", "race": "White", "fnlwgt": 219426, "education": "10th", "workclass": "Private", "Prediction": "<=50K", "ml_results": {"drift": {"metric": 0.06, "attribution_percent": {"age": 27.74, "race": 22.22, "education-num": 25.1}}, "predictions": {"revenue": "<=50K"}, "probabilities": {">50K": 0.13, "<=50K": 0.87}}, "occupation": "Handlers-cleaners", "capital-gain": 0, "capital-loss": 0, "relationship": "Not-in-family", "education-num": 6, "hours-per-week": 40, "marital-status": "Never-married", "native-country": "United-States"} | | {"age": 46, "sex": "Female", "race": "White", "fnlwgt": 295791, "education": "HS-grad", "workclass": "Private", "Prediction": "<=50K", "ml_results": {"drift": {"metric": 0.0, "attribution_percent": {"race": 9.66, "capital-gain": 41.59, "marital-status": 38.47}}, "predictions": {"revenue": "<=50K"}, "probabilities": {">50K": 0.17, "<=50K": 0.83}}, "occupation": "Tech-support", "capital-gain": 0, "capital-loss": 0, "relationship": "Not-in-family", "education-num": 9, "hours-per-week": 30, "marital-status": "Divorced", "native-country": "United-States"} | | {"age": 46, "sex": "Male", "race": "White", "fnlwgt": 114032, "education": "Some-college", "workclass": "Private", "Prediction": "<=50K", "ml_results": {"drift": {"metric": 0.0, "attribution_percent": {"age": 0.0, "capital-gain": 0.0, "education-num": 100.0}}, "predictions": {"revenue": "<=50K"}, "probabilities": {">50K": 0.34, "<=50K": 0.66}}, "occupation": "Tech-support", "capital-gain": 0, "capital-loss": 1887, "relationship": "Husband", "education-num": 10, "hours-per-week": 45, "marital-status": "Married-civ-spouse", "native-country": "United-States"} | +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 10 rows in set (6.8109 sec) -
The following example uses a recommendation model with implicit feedback to predict the ranking for a particular user and item.
mysql> SELECT sys.ML_PREDICT_ROW('{"user_id": "836", "item_id": "226"}', @model, NULL); +---------------------------------------------------------------------------------------+ | sys.ML_PREDICT_ROW('{"user_id": "836", "item_id": "226"}', @model, NULL) | +---------------------------------------------------------------------------------------+ | {"item_id": "226", "user_id": "836", "ml_results": {"predictions": {"rating": 2.46}}} | +---------------------------------------------------------------------------------------+ 1 row in set (0.1390 sec) -
The following example uses a recommendation model with explicit feedback to predict the top two items that a particular user will like with the
users_to_itemsoption.mysql> SELECT sys.ML_PREDICT_ROW('{"user_id": "846"}', @model, JSON_OBJECT("recommend", "users_to_items", "topk", 2)); +----------------------------------------------------------------------------------------------------------------------+ | sys.ML_PREDICT_ROW('{"user_id": "846"}', @model, JSON_OBJECT("recommend", "users_to_items", "topk", 2)) | +----------------------------------------------------------------------------------------------------------------------+ | {"user_id": "846", "ml_results": "{"predictions": {"item_id": ["313", "483"], "rating": [4.06, 4.05]}}"} | +----------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.2811 sec) -
The following example uses a recommendation model with explicit feedback to predict the top two items similar to another item.
mysql> SELECT sys.ML_PREDICT_ROW('{"item_id": "524"}', @model, JSON_OBJECT("recommend", "items_to_items", "topk", 2)); +------------------------------------------------------------------------------------------------------------------------+ | sys.ML_PREDICT_ROW('{"item_id": "524"}', @model, JSON_OBJECT("recommend", "items_to_items", "topk", 2)) | +------------------------------------------------------------------------------------------------------------------------+ | {"item_id": "524", "ml_results": "{"predictions": {"item_id": ["665", "633"], "similarity": [1.0, 1.0]}}"} | +------------------------------------------------------------------------------------------------------------------------+