Documentation Home
HeatWave User Guide
Related Documentation Download this Manual
PDF (US Ltr) - 3.8Mb
PDF (A4) - 3.8Mb


6.5.4.2 Generate Predictions for a Table

ML_PREDICT_TABLE generates predictions for an entire table of trained data. Predictions are performed in parallel.

ML_PREDICT_TABLE is a compute intensive process. Depending on your MySQL version, we recommend the following:

  • Before MySQL 9.4.1, limit operations to batches of rows by splitting large tables into smaller tables by using the batch_size option.

  • For MySQL 9.4.1 and later, if ML_PREDICT_TABLE takes a long time to complete, manually limit input tables to a maximum of 1,000 rows.

See ML_PREDICT_TABLE and Track Progress for MySQL HeatWave AutoML Routines to learn more.

Before You Begin
Input Tables and Output Tables

Before MySQL 9.4.1, the output table of predictions must be a new table with a unique name.

As of MySQL 9.4.1, you can specify the output table and the input table as the same table if all the following conditions are met:

  • The input table is not a Lakehouse external table.

  • The input table does not have the columns that are created for the output table when generating predictions. Output columns are specific to each machine learning task. Some of these columns include:

    • Prediction

    • ml_results

  • The input table does not have a primary key, and it does not have a column named _4aad19ca6e_pk_id. This is because ML_PREDICT_TABLE adds a column as the primary key with the name _4aad19ca6e_pk_id to the output table.

  • The input table was not trained with the log_anomaly_detection task.

If you specify the output table and the input table as the same name, the predictions are inserted into the input table.

Prepare to Generate Predictions for a Table

Before running ML_PREDICT_TABLE, you must train, and then load the model you want to use.

  1. The following example trains a dataset with the classification machine learning task.

    mysql> CALL sys.ML_TRAIN('census_data.census_train', 'revenue', JSON_OBJECT('task', 'classification'), @census_model);
  2. The following example loads the trained model.

    mysql> CALL sys.ML_MODEL_LOAD(@census_model, NULL);

For more information about training and loading models, see Train a Model and Load a Model.

After training and loading the model, you can generate predictions for a table of data. For parameter and option descriptions, see ML_PREDICT_TABLE.

Generate Predictions for a Table

To generate predictions for a table, define the input table, the model handle, the output table, and any addtional options.

mysql> CALL sys.ML_PREDICT_TABLE(table_name, model_handle, output_table_name, [options]);

The following example generates predictions for the entire table in the trained and loaded model.

mysql> CALL sys.ML_PREDICT_TABLE('census_data.census_train', @census_model, 'census_data.census_train_predictions', NULL);

Where:

  • census_data.census_train is the fully qualified name of the test dataset table (schema_name.table_name). The table must have the same feature column names as the training dataset. The target column is not required. If it present in the table, it is not considered when generating predictions.

  • @census_model is the session variable that contains the model handle. Learn more about Model Handles.

  • census_data.census_train_predictions is the output table where predictions are stored. A fully qualified table name must be specified (schema_name.table_name).

  • NULL sets no options to the routine.

When the output table is created, you can query a sample of the table to review predictions.

mysql> SELECT * FROM table_name LIMIT N;

Replace table_name with your own table name, and N with the number of rows from the table you want to view.

The following example queries the top five rows of the output table.

mysql> SELECT * FROM census_train_predictions LIMIT 5;
+-------------------+-----+------------------+--------+--------------+---------------+--------------------+-------------------+--------------+-------+--------+--------------+--------------+----------------+----------------+---------+------------+---------------------------------------------------------------------------------------+
| _4aad19ca6e_pk_id | age | workclass        | fnlwgt | education    | education-num | marital-status     | occupation        | relationship | race  | sex    | capital-gain | capital-loss | hours-per-week | native-country | revenue | Prediction | ml_results                                                                            |
+-------------------+-----+------------------+--------+--------------+---------------+--------------------+-------------------+--------------+-------+--------+--------------+--------------+----------------+----------------+---------+------------+---------------------------------------------------------------------------------------+
|                 1 |  37 | Private          |  99146 | Bachelors    |            13 | Married-civ-spouse | Exec-managerial   | Husband      | White | Male   |            0 |         1977 |             50 | United-States  | >50K    | <=50K      | {"predictions": {"revenue": "<=50K"}, "probabilities": {"<=50K": 0.58, ">50K": 0.42}} |
|                 2 |  34 | Private          |  27409 | 9th          |             5 | Married-civ-spouse | Craft-repair      | Husband      | White | Male   |            0 |            0 |             50 | United-States  | <=50K   | <=50K      | {"predictions": {"revenue": "<=50K"}, "probabilities": {"<=50K": 0.76, ">50K": 0.24}} |
|                 3 |  30 | Private          | 299507 | Assoc-acdm   |            12 | Separated          | Other-service     | Unmarried    | White | Female |            0 |            0 |             40 | United-States  | <=50K   | <=50K      | {"predictions": {"revenue": "<=50K"}, "probabilities": {"<=50K": 0.99, ">50K": 0.01}} |
|                 4 |  62 | Self-emp-not-inc | 102631 | Some-college |            10 | Widowed            | Farming-fishing   | Unmarried    | White | Female |            0 |            0 |             50 | United-States  | <=50K   | <=50K      | {"predictions": {"revenue": "<=50K"}, "probabilities": {"<=50K": 0.9, ">50K": 0.1}}   |
|                 5 |  51 | Private          | 153486 | Some-college |            10 | Married-civ-spouse | Handlers-cleaners | Husband      | White | Male   |            0 |            0 |             40 | United-States  | <=50K   | <=50K      | {"predictions": {"revenue": "<=50K"}, "probabilities": {"<=50K": 0.7, ">50K": 0.3}}   |
+-------------------+-----+------------------+--------+--------------+---------------+--------------------+-------------------+--------------+-------+--------+--------------+--------------+----------------+----------------+---------+------------+---------------------------------------------------------------------------------------+
5 rows in set (0.0014 sec)

The predictions and associated probabilities are displayed in the ml_results column. You can compare the predicted revenue values with the real revenue values in the table. If needed, you can refine and train different sets of data to try and generate more reliable predictions.

What's Next