The table used to train a model cannot exceed 10 GB, 100 million rows, or 1017 columns.
Each dataset must reside in a single table on the MySQL server. AutoML routines operate on a single table.
Table columns must use supported data types. See Supported Data Types for AutoML to learn more.
NaN (Not a Number) values are not recognized by MySQL and should be replaced by
NULL
.-
Refer to the following requirements for specific machine learning models.
Classification models: Must have at least two distinct values, and each distinct value should appear in at least five rows.
Regression models: The target column must be numeric.
The
ML_TRAIN
routine ignores columns missing more than 20% of its values
and columns with the same value in each row. Missing values in
numerical columns are replaced with the average value of the
column, standardized to a mean of 0 and with a standard
deviation of 1. Missing values in categorical columns are
replaced with the most frequent value, and either one-hot or
ordinal encoding is used to convert categorical values to
numeric values. The input data as it exists in the MySQL
database is not modified by
ML_TRAIN
.
To use AutoML, ensure that the MySQL user name that trains a
model does not have a period character ("."). For example, a
user named
'joesmith'@'
is
permitted to train a model, but a user named
%
''joe.smith'@'
is
not. The model catalog schema created by the
%
'ML_TRAIN
procedure incorporates the user name in the schema name (for
example, ML_SCHEMA_joesmith
), and a period is
not a permitted schema name character.
-
Learn more about the following:
Learn how to Create a Machine Learning Model.
Review Machine Learning Use Cases to create machine learning models with sample datasets.