ML_PREDICT_TABLE
generates
predictions for an entire table of trained data. Predictions
are performed in parallel.
ML_PREDICT_TABLE
is a compute
intensive process. Depending on your MySQL version, we
recommend the following:
Before MySQL 9.4.1, limit operations to batches of rows by splitting large tables into smaller tables by using the
batch_size
option.For MySQL 9.4.1 and later, if
ML_PREDICT_TABLE
takes a long time to complete, manually limit input tables to a maximum of 1,000 rows.
See ML_PREDICT_TABLE and Track Progress for MySQL HeatWave AutoML Routines to learn more.
-
Review the following:
Before MySQL 9.4.1, the output table of predictions must be a new table with a unique name.
As of MySQL 9.4.1, you can specify the output table and the input table as the same table if all the following conditions are met:
The input table is not a Lakehouse external table.
-
The input table does not have the columns that are created for the output table when generating predictions. Output columns are specific to each machine learning task. Some of these columns include:
Prediction
ml_results
The input table does not have a primary key, and it does not have a column named
_4aad19ca6e_pk_id
. This is becauseML_PREDICT_TABLE
adds a column as the primary key with the name_4aad19ca6e_pk_id
to the output table.The input table was not trained with the
log_anomaly_detection
task.
If you specify the output table and the input table as the same name, the predictions are inserted into the input table.
Before running
ML_PREDICT_TABLE
, you must
train, and then load the model you want to use.
-
The following example trains a dataset with the classification machine learning task.
mysql> CALL sys.ML_TRAIN('census_data.census_train', 'revenue', JSON_OBJECT('task', 'classification'), @census_model);
-
The following example loads the trained model.
mysql> CALL sys.ML_MODEL_LOAD(@census_model, NULL);
For more information about training and loading models, see Train a Model and Load a Model.
After training and loading the model, you can generate predictions for a table of data. For parameter and option descriptions, see ML_PREDICT_TABLE.
To generate predictions for a table, define the input table, the model handle, the output table, and any addtional options.
mysql> CALL sys.ML_PREDICT_TABLE(table_name, model_handle, output_table_name, [options]);
The following example generates predictions for the entire table in the trained and loaded model.
mysql> CALL sys.ML_PREDICT_TABLE('census_data.census_train', @census_model, 'census_data.census_train_predictions', NULL);
Where:
census_data.census_train
is the fully qualified name of the test dataset table (schema_name.table_name
). The table must have the same feature column names as the training dataset. The target column is not required. If it present in the table, it is not considered when generating predictions.@census_model
is the session variable that contains the model handle. Learn more about Model Handles.census_data.census_train_predictions
is the output table where predictions are stored. A fully qualified table name must be specified (schema_name.table_name
).NULL
sets no options to the routine.
When the output table is created, you can query a sample of the table to review predictions.
mysql> SELECT * FROM table_name LIMIT N;
Replace table_name
with your own
table name, and N
with the number
of rows from the table you want to view.
The following example queries the top five rows of the output table.
mysql> SELECT * FROM census_train_predictions LIMIT 5;
+-------------------+-----+------------------+--------+--------------+---------------+--------------------+-------------------+--------------+-------+--------+--------------+--------------+----------------+----------------+---------+------------+---------------------------------------------------------------------------------------+
| _4aad19ca6e_pk_id | age | workclass | fnlwgt | education | education-num | marital-status | occupation | relationship | race | sex | capital-gain | capital-loss | hours-per-week | native-country | revenue | Prediction | ml_results |
+-------------------+-----+------------------+--------+--------------+---------------+--------------------+-------------------+--------------+-------+--------+--------------+--------------+----------------+----------------+---------+------------+---------------------------------------------------------------------------------------+
| 1 | 37 | Private | 99146 | Bachelors | 13 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 0 | 1977 | 50 | United-States | >50K | <=50K | {"predictions": {"revenue": "<=50K"}, "probabilities": {"<=50K": 0.58, ">50K": 0.42}} |
| 2 | 34 | Private | 27409 | 9th | 5 | Married-civ-spouse | Craft-repair | Husband | White | Male | 0 | 0 | 50 | United-States | <=50K | <=50K | {"predictions": {"revenue": "<=50K"}, "probabilities": {"<=50K": 0.76, ">50K": 0.24}} |
| 3 | 30 | Private | 299507 | Assoc-acdm | 12 | Separated | Other-service | Unmarried | White | Female | 0 | 0 | 40 | United-States | <=50K | <=50K | {"predictions": {"revenue": "<=50K"}, "probabilities": {"<=50K": 0.99, ">50K": 0.01}} |
| 4 | 62 | Self-emp-not-inc | 102631 | Some-college | 10 | Widowed | Farming-fishing | Unmarried | White | Female | 0 | 0 | 50 | United-States | <=50K | <=50K | {"predictions": {"revenue": "<=50K"}, "probabilities": {"<=50K": 0.9, ">50K": 0.1}} |
| 5 | 51 | Private | 153486 | Some-college | 10 | Married-civ-spouse | Handlers-cleaners | Husband | White | Male | 0 | 0 | 40 | United-States | <=50K | <=50K | {"predictions": {"revenue": "<=50K"}, "probabilities": {"<=50K": 0.7, ">50K": 0.3}} |
+-------------------+-----+------------------+--------+--------------+---------------+--------------------+-------------------+--------------+-------+--------+--------------+--------------+----------------+----------------+---------+------------+---------------------------------------------------------------------------------------+
5 rows in set (0.0014 sec)
The predictions and associated probabilities are displayed
in the ml_results
column. You can compare
the predicted revenue values with the real revenue values in
the table. If needed, you can refine and train different
sets of data to try and generate more reliable predictions.
Review ML_PREDICT_TABLE for parameter descriptions and options.
After generating predictions on a table, learn how to Generate Explanations on a table to get insights into which features have the most influence on the predictions.
Learn how to Generate Predictions for a Row of Data.
Learn how to Score a Model to get insight into the quality of the model.