General requirements for HeatWave AutoML data include the following:
For information about loading data into a MySQL DB System, see Importing and Exporting Databases.
Tables used with HeatWave AutoML must not exceed 10 GB, 100 million rows, or 1017 columns. Before MySQL 8.0.29, the column limit was 900.
Table columns must use supported data types. For supported data types and recommendations for how to handle unsupported types, see Section 3.14, “Supported Data Types”.
NaN (Not a Number) values are not recognized by MySQL and should be replaced by
The target column in a training dataset for a classification model must have at least two distinct values, and each distinct value should appear in at least five rows. For a regression model, only a numeric target column is permitted.
ML_TRAIN routine ignores
columns missing more than 20% of its values and columns with
the same value in each row. Missing values in numerical
columns are replaced with the average value of the column,
standardized to a mean of 0 and with a standard deviation of
1. Missing values in categorical columns are replaced with
the most frequent value, and either one-hot or ordinal
encoding is used to convert categorical values to numeric
values. The input data as it exists in the MySQL database is
not modified by