MySQL HeatWave User Guide  /  HeatWave ML  /  Predictions

3.6 Predictions

Predictions are generated by running ML_PREDICT_ROW or ML_PREDICT_TABLE on unlabeled data; that is, it must have the same feature columns as the data used to train the model but no target column.

ML_PREDICT_ROW generates predictions for one or more rows of data. ML_PREDICT_TABLE generates predictions for an entire table of data and saves the results to an output table.

Row Predictions

ML_PREDICT_ROW generates predictions for one or more rows of data specified in JSON format. It is invoked using a SELECT statement. For ML_PREDICT_ROW parameter descriptions, see Section 3.10.4, “ML_PREDICT_ROW”.

Before running ML_PREDICT_ROW, ensure that the model you want to use is loaded; for example:

CALL sys.ML_MODEL_LOAD(@census_model, NULL);

For more information about loading models, see Section 3.9.3, “Loading Models”.

The following example runs ML_PREDICT_ROW on a single row of unlabeled data, which is assigned to a @row_input session variable:

SET @row_input = JSON_OBJECT( 
"age", 25, 
"workclass", "Private", 
"fnlwgt", 226802, 
"education", "11th", 
"education-num", 7, 
"marital-status", "Never-married", 
"occupation", "Machine-op-inspct", 
"relationship", "Own-child", 
"race", "Black", 
"sex", "Male", 
"capital-gain", 0, 
"capital-loss", 0, 
"hours-per-week", 40, 
"native-country", "United-States"); 

SELECT sys.ML_PREDICT_ROW(@row_input, @census_model);

where:

  • @row_input is a session variable containing a row of unlabeled data. The data is specified in JSON key-value format. The column names must match the feature column names in the training dataset.

  • @census_model is the session variable that contains the model handle.

ML_PREDICT_ROW returns a JSON object containing a "Prediction" key with the predicted value and the features values used to make the prediction.

You can also run ML_PREDICT_ROW on multiple rows of data selected from a table. For an example, refer to the syntax examples in Section 3.10.4, “ML_PREDICT_ROW”.

Table Predictions

ML_PREDICT_TABLE generates predictions for an entire table of unlabeled data and saves the results to an output table. Predictions are performed in parallel. For ML_PREDICT_TABLE parameter descriptions, see Section 3.10.5, “ML_PREDICT_TABLE”.

ML_PREDICT_TABLE is a compute intensive process. Limiting operations to batches of 10 to 100 rows by splitting large tables into smaller tables is recommended.

Before running ML_PREDICT_TABLE, ensure that the model you want to use is loaded; for example:

CALL sys.ML_MODEL_LOAD(@census_model, NULL);

For more information about loading models, see Section 3.9.3, “Loading Models”.

The following example creates a table with 10 rows of unlabeled test data and generates predictions for that table:

CREATE TABLE heatwaveml_bench.census_test_subset AS SELECT * FROM heatwaveml_bench.census_test 
LIMIT 10;  
  
CALL sys.ML_PREDICT_TABLE('heatwaveml_bench.census_test_subset', @census_model, 
'heatwaveml_bench.census_predictions');

where:

  • heatwaveml_bench.census_test_subset is the fully qualified name of the test dataset table (schema_name.table_name). The table must have the same feature column names as the training dataset but no target column.

  • @census_model is the session variable that contains the model handle.

  • heatwaveml_bench.census_predictions is the output table where predictions are stored. The table is created if it does not exist. A fully qualified table name must be specified (schema_name.table_name). If the table already exists, an error is returned.

To view ML_PREDICT_TABLE results, query the output table; for example:

SELECT * FROM heatwaveml_bench.census_predictions;

ML_PREDICT_TABLE populates the output table with predictions and the features used to make each prediction.