Run the ML_TRAIN
routine on a
training dataset to produce a trained machine learning model.
ML_TRAIN
supports training of
classification, regression, and forecasting models. Use a
classification model to predict discrete values. Use a
regression model to predict continuous values. Use a forecasting
model to create timeseries forecasts for temporal data.
The time required to train a model can take a few minutes to a
few hours depending on the number of rows and columns in the
dataset, specified ML_TRAIN
parameters, and the size of the HeatWave Cluster. HeatWave AutoML supports
tables up to 10 GB in size with a maximum of 100 million rows
and or 1017 columns.
ML_TRAIN
stores machine learning
models in the MODEL_CATALOG
table. See
Section 3.14.1, “The Model Catalog”.
For ML_TRAIN
option descriptions,
see Section 3.16.1, “ML_TRAIN”.
The training dataset used with
ML_TRAIN
must reside in a table
on the MySQL DB System. For an example training dataset, see
Section 3.4.4, “Example Data”.
The following example runs
ML_TRAIN
on the
heatwaveml_bench.census_train
training
dataset:
mysql> CALL sys.ML_TRAIN('heatwaveml_bench.census_train', 'revenue',
JSON_OBJECT('task', 'classification'), @census_model);
Where:
heatwaveml_bench.census_train
is the fully qualified name of the table that contains the training dataset (schema_name.table_name
).revenue
is the name of the target column, which contains ground truth values.-
JSON_OBJECT('task', 'classification')
specifies the machine learning task type.Specify
NULL
instead ofJSON
options to use the defaultclassification
task type.When using the
regression
task type, only a numeric target column is permitted.For the
forecasting
task type, see Section 3.9, “Forecasting”.For the
anomaly_detection
task type, see Section 3.10, “Anomaly Detection”For the
recommendation
task type, see Section 3.11, “Recommendations” @census_model
is the name of the user-defined session variable that stores the model handle for the duration of the connection. User variables are written as@
. Some of the examples in this guide usevar_name
@census_model
as the variable name. Any valid name for a user-defined variable is permitted, for example@my_model
).
After ML_TRAIN
trains a model,
the model is stored in the model catalog. To retrieve the
generated model handle, query the specified session variable;
for example:
mysql> SELECT @census_model;
+--------------------------------------------------+
| @census_model |
+--------------------------------------------------+
| heatwaveml_bench.census_train_user1_1636729526 |
+--------------------------------------------------+
While using the same connection used to execute
ML_TRAIN
, specify the session
variable, for example @census_model
, in
place of the model handle in other HeatWave AutoML routines, but
the session variable data is lost when the current session is
terminated. If you need to look up a model handle, you can do
so by querying the model catalog table. See
Section 3.14.8, “Model Handles”.
The quality and reliability of a trained model can be assessed
using the ML_SCORE
routine. For
more information, see Section 3.14.6, “Scoring Models”.
ML_TRAIN
displays the following
message if a trained model has a low score: Model Has a
low training score, expect low quality model
explanations
.