ML_PREDICT_TABLE
generates predictions for an entire table of trained data and
saves the results to an output table. Predictions are
performed in parallel.
ML_PREDICT_TABLE
is a compute intensive process. Limiting operations to batches
of 10 to 100 rows by splitting large tables into smaller
tables is recommended. Use batch processing with the
batch_size
option.
-
Review the following:
Before running
ML_PREDICT_TABLE
,
you must train, and then load the model you want to use.
-
The following example trains a dataset with the classification machine learning task.
mysql> CALL sys.ML_TRAIN('census_data.census_train', 'revenue', JSON_OBJECT('task', 'classification'), @census_model);
-
The following example loads the trained model.
mysql> CALL sys.ML_MODEL_LOAD(@census_model, NULL);
For more information about training and loading models, see Train a Model and Load a Model.
After training and loading the model, you can generate predictions for a table of data. For parameter and option descriptions, see ML_PREDICT_TABLE.
To generate predictions for a table, define the input table, the model handle, the output table, and any additional options.
mysql> CALL sys.ML_PREDICT_TABLE(table_name, model_handle, output_table_name), [options]);
The following example generates predictions for the entire table in the trained and loaded model.
mysql> CALL sys.ML_PREDICT_TABLE('census_data.census_train', @census_model, 'census_data.census_train_predictions', NULL);
Where:
census_data.census_train
is the fully qualified name of the test dataset table (schema_name.table_name
). The table must have the same feature column names as the training dataset. The target column is not required. If it present in the table, it is not considered when generating predictions.@census_model
is the session variable that contains the model handle. Learn more about Model Handles.census_data.census_train_predictions
is the output table where predictions are stored. A fully qualified table name must be specified (schema_name.table_name
). If the table already exists, an error is returned.NULL
sets no options to the routine.
When the output table is created, you can query a sample of the table to review predictions.
mysql> SELECT * FROM table_name LIMIT N;
Replace table_name
with your own
table name, and N
with the number
of rows from the table you want to view.
The following example queries the top five rows of the output table.
mysql> SELECT * FROM census_train_predictions LIMIT 5;
+-------------------+-----+------------------+--------+--------------+---------------+--------------------+-------------------+--------------+-------+--------+--------------+--------------+----------------+----------------+---------+------------+---------------------------------------------------------------------------------------+
| _4aad19ca6e_pk_id | age | workclass | fnlwgt | education | education-num | marital-status | occupation | relationship | race | sex | capital-gain | capital-loss | hours-per-week | native-country | revenue | Prediction | ml_results |
+-------------------+-----+------------------+--------+--------------+---------------+--------------------+-------------------+--------------+-------+--------+--------------+--------------+----------------+----------------+---------+------------+---------------------------------------------------------------------------------------+
| 1 | 37 | Private | 99146 | Bachelors | 13 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 0 | 1977 | 50 | United-States | >50K | <=50K | {"predictions": {"revenue": "<=50K"}, "probabilities": {"<=50K": 0.58, ">50K": 0.42}} |
| 2 | 34 | Private | 27409 | 9th | 5 | Married-civ-spouse | Craft-repair | Husband | White | Male | 0 | 0 | 50 | United-States | <=50K | <=50K | {"predictions": {"revenue": "<=50K"}, "probabilities": {"<=50K": 0.76, ">50K": 0.24}} |
| 3 | 30 | Private | 299507 | Assoc-acdm | 12 | Separated | Other-service | Unmarried | White | Female | 0 | 0 | 40 | United-States | <=50K | <=50K | {"predictions": {"revenue": "<=50K"}, "probabilities": {"<=50K": 0.99, ">50K": 0.01}} |
| 4 | 62 | Self-emp-not-inc | 102631 | Some-college | 10 | Widowed | Farming-fishing | Unmarried | White | Female | 0 | 0 | 50 | United-States | <=50K | <=50K | {"predictions": {"revenue": "<=50K"}, "probabilities": {"<=50K": 0.9, ">50K": 0.1}} |
| 5 | 51 | Private | 153486 | Some-college | 10 | Married-civ-spouse | Handlers-cleaners | Husband | White | Male | 0 | 0 | 40 | United-States | <=50K | <=50K | {"predictions": {"revenue": "<=50K"}, "probabilities": {"<=50K": 0.7, ">50K": 0.3}} |
+-------------------+-----+------------------+--------+--------------+---------------+--------------------+-------------------+--------------+-------+--------+--------------+--------------+----------------+----------------+---------+------------+---------------------------------------------------------------------------------------+
5 rows in set (0.0014 sec)
The predictions and associated probabilities are displayed
in the ml_results
column. You can compare
the predicted revenue values with the real revenue values in
the table. If needed, you can refine and train different
sets of data to try and generate more reliable predictions.
Review ML_PREDICT_TABLE for parameter descriptions and options.
After generating predictions on a table, learn how to Generate Explanations on a table to get insights into which features have the most influence on the predictions.
Learn how to Generate Predictions for a Row of Data.
Learn how to Score a Model to get insight into the quality of the model.