Documentation Home
MySQL AI
Download this Manual
PDF (US Ltr) - 1.4Mb
PDF (A4) - 1.4Mb


MySQL AI  /  Training and Using Machine Learning Models  /  Additional AutoML Requirements

4.2 Additional AutoML Requirements

Before You Begin

Model and Table Sizes

  • The table used to train a model cannot exceed 10 GB, 100 million rows, or 1017 columns.

Data Requirements

  • Each dataset must reside in a single table on the MySQL server. AutoML routines operate on a single table.

  • Table columns must use supported data types. See Supported Data Types for AutoML to learn more.

  • NaN (Not a Number) values are not recognized by MySQL and should be replaced by NULL.

  • Refer to the following requirements for specific machine learning models.

    • Classification models: Must have at least two distinct values, and each distinct value should appear in at least five rows.

    • Regression models: The target column must be numeric.

Note

The ML_TRAIN routine ignores columns missing more than 20% of its values and columns with the same value in each row. Missing values in numerical columns are replaced with the average value of the column, standardized to a mean of 0 and with a standard deviation of 1. Missing values in categorical columns are replaced with the most frequent value, and either one-hot or ordinal encoding is used to convert categorical values to numeric values. The input data as it exists in the MySQL database is not modified by ML_TRAIN.

MySQL User Names

To use AutoML, ensure that the MySQL user name that trains a model does not have a period character ("."). For example, a user named 'joesmith'@'%' is permitted to train a model, but a user named 'joe.smith'@'%' is not. The model catalog schema created by the ML_TRAIN procedure incorporates the user name in the schema name (for example, ML_SCHEMA_joesmith), and a period is not a permitted schema name character.

What's Next