MySQL :: HeatWave User Guide :: 3.5 Training a Model

HeatWave User Guide / HeatWave AutoML / Training a Model

3.5 Training a Model

3.5.1 Advanced ML_TRAIN Options

Run the ML_TRAIN routine on a training dataset to produce a trained machine learning model.

ML_TRAIN supports training of the following models:

Classification
Regression
Forecasting
Anomaly Detection
Recommendation
Topic Modeling

The time required to train a model can take a few minutes to a few hours depending on the number of rows and columns in the dataset, specified ML_TRAIN parameters, and the size of the HeatWave Cluster. HeatWave AutoML supports tables up to 10 GB in size with a maximum of 100 million rows and or 1017 columns.

ML_TRAIN stores machine learning models in the MODEL_CATALOG table. See Section 3.14.1, “The Model Catalog”.

For ML_TRAIN option descriptions, see Section 3.16.1, “ML_TRAIN”.

The training dataset used with ML_TRAIN must reside in a table on the MySQL DB System. For an example training dataset, see Section 3.4.4, “Example Data”.

The following example runs ML_TRAIN on the heatwaveml_bench.census_train training dataset:

Press CTRL+C to copy
 
mysql>CALL sys.ML_TRAIN('heatwaveml_bench.census_train', 'revenue', 
          JSON_OBJECT('task', 'classification'), @census_model);

Where:

heatwaveml_bench.census_train is the fully qualified name of the table that contains the training dataset (schema_name.table_name).
revenue is the name of the target column, which contains ground truth values.
JSON_OBJECT('task', 'classification') specifies the machine learning task type.

Specify NULL instead of JSON options to use the default classification task type.

When using the regression task type, only a numeric target column is permitted.

For the forecasting task type, see Section 3.9, “Forecasting”.

For the anomaly_detection task type, see Section 3.10, “Anomaly Detection”

For the recommendation task type, see Section 3.11, “Recommendations”
@census_model is the name of the user-defined session variable that stores the model handle for the duration of the connection. User variables are written as @var_name. Some of the examples in this guide use @census_model as the variable name. Any valid name for a user-defined variable is permitted, for example @my_model).

After ML_TRAIN trains a model, the model is stored in the model catalog. To retrieve the generated model handle, query the specified session variable; for example:

Press CTRL+C to copy
 
mysql>SELECT @census_model;
+--------------------------------------------------+
| @census_model                                    |
+--------------------------------------------------+
| heatwaveml_bench.census_train_user1_1636729526   |
+--------------------------------------------------+

Tip

While using the same connection used to execute ML_TRAIN, specify the session variable, for example @census_model, in place of the model handle in other HeatWave AutoML routines, but the session variable data is lost when the current session is terminated. If you need to look up a model handle, you can do so by querying the model catalog table. See Section 3.14.8, “Model Handles”.

The quality and reliability of a trained model can be assessed using the ML_SCORE routine. For more information, see Section 3.14.6, “Scoring Models”. ML_TRAIN displays the following message if a trained model has a low score: Model Has a low training score, expect low quality model explanations.

PREV HOME UP NEXT