Related Documentation Download this Manual
PDF (US Ltr) - 1.6Mb
PDF (A4) - 1.6Mb


MySQL HeatWave User Guide  /  HeatWave AutoML  /  Training a Model

3.5 Training a Model

Run the ML_TRAIN routine on a training dataset to produce a trained machine learning model.

ML_TRAIN supports training of classification, regression, and forecasting models. Use a classification model to predict discrete values. Use a regression model to predict continuous values. Use a forecasting model to create timeseries forecasts for date and time data.

The time required to train a model can take a few minutes to a few hours depending on the number of rows and columns in the dataset, specified ML_TRAIN parameters, and the size of the HeatWave Cluster. HeatWave AutoML supports tables up to 10 GB in size with a maximum of 100 million rows and or 1017 columns. Before MySQL 8.0.29, the column limit was 900.

ML_TRAIN stores machine learning models in the MODEL_CATALOG table. See Section 3.13.1, “The Model Catalog”.

For ML_TRAIN option descriptions, see Section 3.15.1, “ML_TRAIN”.

The training dataset used with ML_TRAIN must reside in a table on the MySQL DB System. For an example training dataset, see Section 3.4.4, “Example Data”.

The following example runs ML_TRAIN on the heatwaveml_bench.census_train training dataset:

mysql> CALL sys.ML_TRAIN('heatwaveml_bench.census_train', 'revenue', 
          JSON_OBJECT('task', 'classification'), @census_model);

Where:

  • heatwaveml_bench.census_train is the fully qualified name of the table that contains the training dataset (schema_name.table_name).

  • revenue is the name of the target column, which contains ground truth values.

  • JSON_OBJECT('task', 'classification') specifies the machine learning task type.

    Specify NULL instead of JSON options to use the default classification task type.

    When using the regression task type, only a numeric target column is permitted.

    For the forecasting task type, see Section 3.9, “Forecasting”.

    For the anomaly_detection task type, see Section 3.10, “Anomaly Detection”

    For the recommendation task type, see Section 3.11, “Recommendations”

  • @census_model is the name of the user-defined session variable that stores the model handle for the duration of the connection. User variables are written as @var_name. Some of the examples in this guide use @census_model as the variable name. Any valid name for a user-defined variable is permitted, for example @my_model).

After ML_TRAIN trains a model, the model is stored in the model catalog. To retrieve the generated model handle, query the specified session variable; for example:

mysql> SELECT @census_model;
+--------------------------------------------------+
| @census_model                                    |
+--------------------------------------------------+
| heatwaveml_bench.census_train_user1_1636729526   |
+--------------------------------------------------+
Tip

While using the same connection used to execute ML_TRAIN, specify the session variable, for example @census_model, in place of the model handle in other HeatWave AutoML routines, but the session variable data is lost when the current session is terminated. If you need to look up a model handle, you can do so by querying the model catalog table. See Section 3.13.8, “Model Handles”.

The quality and reliability of a trained model can be assessed using the ML_SCORE routine. For more information, see Section 3.13.6, “Scoring Models”. As of MySQL 8.0.30, ML_TRAIN displays the following message if a trained model has a low score: Model Has a low training score, expect low quality model explanations.