After preparing the data for a forecasting model, you can train the model.
This topic has the following sections.
Review and complete all the tasks to Prepare Data for a Forecasting Model.
Define the following required parameters to train a forecasting model.
Set the
task
parameter toforecasting
.-
datetime_index
: Define the column that has date and time data. The model uses this column as an index for the forecast variable. The following data types for this column are supported:DATETIME
,TIMESTAMP
,DATE
,TIME
, andYEAR
, or an auto-incrementing index.The forecast models
SARIMAXForecaster
,VARMAXForecaster
, andDynFactorForecaster
cannot back test, that is forecast into training data, when usingexogenous_variables
. Therefore, the predict table must not overlap thedatetime_index
with the training table. The start date in the predict table must be a date immediately following the last date in the training table whenexogenous_variables
are used. For example, the predict table has to start with year 2024 if the training table withYEAR
data typedatetime_index
ends with year 2023. The predict table cannot start with year, for example, 2025 or 2030, because that would miss out 1 and 6 years, respectively.When
options
do not includeexogenous_variables
, the predict table can overlap thedatetime_index
with the training table. This supports back testing, with the exception of the following models: SARIMAXForecaster, VARMAXForecaster, and DynFactorForecaster.The valid range of years for
datetime_index
dates must be between 1678 and 2261. An error is returned if any part of the training table or predict table has dates outside this range. The last date in the training table plus the predict table length must still be inside the valid year range. For example, if thedatetime_index
in the training table hasYEAR
data type, and the last date is year 2023, the predict table length must be less than 238 rows: 2261 minus 2023 equals 238 rows. endogenous_variables
: Define the column or columns to be forecast. One of these columns must also be specified as thetarget_column_name
.
Based on the type of forecasting model you train, set the
appropriate JSON
options:
exogenous_variables
: Define the column or columns that have independent, non-forecast, predictive variables. These optional variables are not forecast, but help to predict the future values of the forecast variables. These variables affect a model without being affected by it. For example, for sales forecasting these variables might be advertising expenditure, occurrence of promotional events, weather, or holidays. Review Forecasting Models to see which models support exogenous variables.model_list
: Set the type of forecasting model algorithm. See Forecasting Models to review supported algorithms.include_column_list
: Define the columns ofexogenous_variables
that must be included for training and should not be dropped.
You cannot run the following routines for a trained forecasting model:
After following the steps to
Prepare
Data for Forecasting Model, train the model with the
ML_TRAIN
routine and use the
electricity_demand_training
table
previously created. Before training the model, it is good
practice to define the model handle instead of automatically
creating one. See
Defining Model Handle.
-
Optionally, set the value of the session variable, which sets the model handle to this same value.
mysql> SET @variable = 'model_handle';
Replace
@variable
andmodel_handle
with your own definitions. For example:mysql> SET @model='forecasting_use_case';
The model handle is set to
forecasting_use_case
. -
Run the
ML_TRAIN
routine.mysql> CALL sys.ML_TRAIN('table_name', 'target_column_name', JSON_OBJECT('task', 'task_name'), model_handle);
Replace
table_name
,target_column_name
,task_name
, andmodel_handle
with your own values.The following example runs
ML_TRAIN
on the training dataset previously created.mysql> CALL sys.ML_TRAIN('forecasting_data.electricity_demand_train', 'demand', JSON_OBJECT('task', 'forecasting', 'datetime_index', 'date', 'endogenous_variables', JSON_ARRAY('demand')), @model);
Where:
forecasting_data.electricity_demand_train
is the fully qualified name of the table that contains the training dataset (database_name.table_name
).demand
is the name of the target column, which contains ground truth values.-
The
JSON_OBJECT
defines the following:'task', 'forecasting'
specifies the machine learning task type.'datetime_index', 'date'
defines thedate
column as the one with data and time data.'endogenous_variables', JSON_ARRAY('demand')
defines the endogenous variables in aJSON_ARRAY
. Since it is a univariate model, the only endogenous variable isdemand
.
@model
is the session variable previously set that defines the model handle to the name defined by the user:forecasting_use_case
. If you do not define the model handle before training the model, the model handle is automatically generated, and the session variable only stores the model handle for the duration of the connection. User variables are written as@
. Any valid name for a user-defined variable is permitted. See Work with Model Handles to learn more.var_name
-
When the training operation finishes, the model handle is assigned to the
@model
session variable, and the model is stored in the model catalog. View the entry in the model catalog with the following query. Replaceuser1
with your MySQL account name.mysql> SELECT model_id, model_handle, train_table_name FROM ML_SCHEMA_user1.MODEL_CATALOG WHERE model_handle = 'forecasting_use_case'; +----------+----------------------------------------------+--------------------------------------------+ | model_id | model_handle | train_table_name | +----------+----------------------------------------------+--------------------------------------------+ | 3 | forecasting_use_case | forecasting_data.electricity_demand_train | +----------+----------------------------------------------+--------------------------------------------+
Learn how to Generate Predictions for a Forecasting Model.
Review additional Syntax Examples for Forecast Training