Related Documentation Download this Manual
PDF (US Ltr) - 1.6Mb
PDF (A4) - 1.6Mb


MySQL HeatWave User Guide  /  ...  /  General Data Requirements

3.4.3 General Data Requirements

General requirements for HeatWave AutoML data include the following:

  • Each dataset must reside in a single table on the MySQL DB System. HeatWave AutoML routines such as ML_TRAIN, ML_PREDICT_TABLE, and ML_EXPLAIN_TABLE operate on a single table.

    For information about loading data into a MySQL DB System, see Importing and Exporting Databases.

  • Tables used with HeatWave AutoML must not exceed 10 GB, 100 million rows, or 1017 columns. Before MySQL 8.0.29, the column limit was 900.

  • Table columns must use supported data types. For supported data types and recommendations for how to handle unsupported types, see Section 3.16, “Supported Data Types”.

  • NaN (Not a Number) values are not recognized by MySQL and should be replaced by NULL.

  • The target column in a training dataset for a classification model must have at least two distinct values, and each distinct value should appear in at least five rows. For a regression model, only a numeric target column is permitted.

Note

The ML_TRAIN routine ignores columns missing more than 20% of its values and columns with the same value in each row. Missing values in numerical columns are replaced with the average value of the column, standardized to a mean of 0 and with a standard deviation of 1. Missing values in categorical columns are replaced with the most frequent value, and either one-hot or ordinal encoding is used to convert categorical values to numeric values. The input data as it exists in the MySQL database is not modified by ML_TRAIN.