MySQL :: MySQL AI 9.4 :: 4.5.2 Training a Model

Before You Begin

Review how to Prepare Data.
Review Additional AutoML Requirements.

ML_TRAIN Overview

ML_TRAIN supports training of the following models:

Classification: Assign items to defined categories.
Regression: Generate a prediction based on the relationship between a dependent variable and one or more independent variables.
Forecasting: Use a timeseries dataset to generate forecasting predictions.
Anomaly Detection: Detect unusual patterns in data.
Recommendation: Generate user and product recommendations.
Topic Modeling: Generate words and similar expressions that best characterize a set of documents.

The training dataset used with ML_TRAIN must reside in a table on the MySQL server.

ML_TRAIN stores machine learning models in the MODEL_CATALOG table. See The Model Catalog to learn more.

The time required to train a model can take a few minutes to a few hours depending on the following:

The number of rows and columns in the dataset. AutoML supports tables up to 10 GB in size with a maximum of 100 million rows and or 1017 columns.
The specified ML_TRAIN parameters.

To learn more about ML_TRAIN requirements and options, see ML_TRAIN or Machine Learning Use Cases.

The quality and reliability of a trained model can be assessed using the ML_SCORE routine. For more information, see Score a Model. ML_TRAIN displays the following message if a trained model has a low score: Model Has a low training score, expect low quality model explanations.

ML_TRAIN Example

Before training a model, it is good practice to define your own model handle instead of automatically generating one. This allows you to easily remember the model handle for future routines on the trained model instead of having to query it, or depending on the session variable that can no longer be used when the current connection terminates. See Defining Model Handle to learn more.

To train a machine learning model:

Optionally, set the value of the session variable, which sets the model handle to this same value.
```
mysql> SET @variable = 'model_handle';
```
Replace @variable and model_handle with your own definitions. For example:
```
mysql> SET @census_model = 'census_test';
```
The model handle is set to census_test.
Run the ML_TRAIN routine.
```
mysql> CALL sys.ML_TRAIN('table_name', 'target_column_name', JSON_OBJECT('task', 'task_name'), @variable);
```
Replace table_name, target_column_name, task_name, and variable with your own values.

The following example runs ML_TRAIN on the census_data.census_train training dataset.
```
mysql> CALL sys.ML_TRAIN('census_data.census_train', 'revenue', JSON_OBJECT('task', 'classification'), @census_model);
```
Where:
- census_data.census_train is the fully qualified name of the table that contains the training dataset (schema_name.table_name).
- revenue is the name of the target column, which contains ground truth values.
- JSON_OBJECT('task', 'classification') specifies the machine learning task type.
- @census_model is the session variable previously set that defines the model handle to the name defined by the user: census_test. If you do not define the model handle before training the model, the model handle is automatically generated, and the session variable only stores the model handle for the duration of the connection. User variables are written as @var_name. Any valid name for a user-defined variable is permitted. See Work with Model Handles to learn more.

When the training completes, query the model catalog for the model handle and the name of the trained table to confirm the model handle is correctly set. Replace user1 with your own user name.

mysql> SELECT model_handle, train_table_name FROM ML_SCHEMA_user1.MODEL_CATALOG;
+-----------------------------------------------------+---------------------------------+
| model_handle                                        | train_table_name                |
+-----------------------------------------------------+---------------------------------+
| census_test                                         | census_data.census_train        |
+-----------------------------------------------------+---------------------------------+
1 row in set (0.0450 sec)

Tip

When done working with a trained model, it is good practice to unload it. See Unload a Model.

What's Next

For details on all training options and to view more examples for task-specific models, see ML_TRAIN.
Learn how to Load a Model.