Related Documentation Download this Manual
PDF (US Ltr) - 1.6Mb
PDF (A4) - 1.6Mb


MySQL HeatWave User Guide  /  HeatWave AutoML  /  HeatWave AutoML and Lakehouse

3.12 HeatWave AutoML and Lakehouse

MySQL 8.1.0 adds support for HeatWave AutoML routines loading data directly from Object Storage with Lakehouse. Lakehouse must be enabled. See Chapter 4, HeatWave Lakehouse.

ML_TRAIN, ML_MODEL_LOAD, ML_EXPLAIN ML_PREDICT_TABLE, ML_EXPLAIN_TABLE, and ML_SCORE routines require no changes.

ML_PREDICT_ROW and ML_EXPLAIN_ROW routines cannot use a FROM clause.

Loading data from Lakehouse into HeatWave and unloading:

  • If the Lakehouse table had not been loaded into HeatWave before a HeatWave AutoML command, then the data will be unloaded after the command.

  • If the Lakehouse table had been loaded into HeatWave before a HeatWave AutoML command, then the data will remain in HeatWave after the command.

HeatWave AutoML commands operate on data loaded into HeatWave. If the original Lakehouse data in Object Storage is deleted or modified this will not affect a HeatWave AutoML command, until the data is unloaded from HeatWave.

Syntax Examples

The following examples use data from: Bank Marketing. The target column is y.

  • A CREATE TABLE example with Lakehouse details that loads the training dataset.

    mysql> CREATE TABLE bank_marketing_lakehouse_train(
            age int,
            job varchar(255),
            marital varchar(255),
            education varchar(255),
            default1 varchar(255),
            balance float,
            housing varchar(255),
            loan varchar(255),
            contact varchar(255),
            day int,
            month varchar(255),
            duration float,
            campaign int,
            pdays float,
            previous float,
            poutcome varchar(255),
            y varchar(255)
        )
    	ENGINE=LAKEHOUSE
    	SECONDARY_ENGINE=RAPID
      ENGINE_ATTRIBUTE='{"dialect": {"format": "csv",
                                     "skip_rows": 1,
                                     "field_delimiter":",",
                                     "record_delimiter":"\\n"}}',
                         "file": [{  "region": "region",
                                     "namespace": "namespace",
                                     "bucket": "bucket",
                                     "prefix": "mlbench/bank_marketing_train.csv"}]'
  • An ALTER TABLE example with Lakehouse details that loads the test dataset.

    mysql> CREATE TABLE bank_marketing_lakehouse_train(
            age int,
            job varchar(255),
            marital varchar(255),
            education varchar(255),
            default1 varchar(255),
            balance float,
            housing varchar(255),
            loan varchar(255),
            contact varchar(255),
            day int,
            month varchar(255),
            duration float,
            campaign int,
            pdays float,
            previous float,
            poutcome varchar(255),
            y varchar(255)
        );
    
    mysql> ALTER TABLE bank_marketing_lakehouse_test
      ENGINE=LAKEHOUSE 
      SECONDARY_ENGINE=RAPID
      ENGINE_ATTRIBUTE='{"dialect": {"format": "csv",
                                     "skip_rows": 1,
                                     "field_delimiter":",",
                                     "record_delimiter":"\\n"}}',
                         "file": [{  "region": "region",
                                     "namespace": "namespace",
                                     "bucket": "bucket",
                                     "prefix": "mlbench/bank_marketing_test.csv"}]';
  • ML_TRAIN, ML_MODEL_LOAD, and ML_SCORE examples that use the Lakehouse data.

    mysql> CALL sys.ML_TRAIN('ml_data.bank_marketing_lakehouse_train', 'y', NULL, @bank_model);
    
    mysql> CALL sys.ML_MODEL_LOAD(@bank_model, NULL);
    
    mysql> CALL sys.ML_SCORE('ml_data.bank_marketing_lakehouse_test', 'y', @bank_model, 
            'balanced_accuracy', @score, NULL);
  • ML_PREDICT_TABLE, ML_EXPLAIN, and ML_EXPLAIN_TABLE examples that use the Lakehouse data. ML_EXPLAIN_TABLE might take a long time for a large data set, but this is shape dependent.

    mysql> CALL sys.ML_PREDICT_TABLE('ml_data.bank_marketing_lakehouse_test', @bank_model, 
            'ml_data.bank_marketing_lakehouse_test_predictions', NULL);
    
    mysql>  CALL sys.ML_EXPLAIN('ml_data.bank_marketing_lakehouse_test', 'y', @bank_model, 
            JSON_OBJECT('prediction_explainer', 'permutation_importance'));
    
    mysql> CALL sys.ML_EXPLAIN_TABLE('ml_data.bank_marketing_lakehouse_test', @bank_model, 
            'ml_data.bank_marketing_lakehouse_test_explanations', 
            JSON_OBJECT('prediction_explainer', 'permutation_importance'));
  • Examples for ML_PREDICT_ROW and ML_EXPLAIN_ROW that insert data directly, and avoid the FROM clause.

    mysql> SELECT sys.ML_PREDICT_ROW('{
            "age": 37, 
            "job": "admin.", 
            "marital": "married", 
            "education": "unknown", 
            "default1": "no", 
            "balance": 734, 
            "housing": "yes", 
            "loan": "no", 
            "contact": "unknown", 
            "day": 21, 
            "month": "may", 
            "duration": 1106, 
            "campaign": 1, 
            "pdays": -1, 
            "previous": 0, 
            "poutcome": 
            "unknown", 
            "y": "no"}', 
            @bank_model, NULL);
    
    mysql> SELECT sys.ML_EXPLAIN_ROW('{
            "age": 37, 
            "job": "admin.", 
            "marital": "married", 
            "education": "unknown", 
            "default1": "no", 
            "balance": 734, 
            "housing": "yes", 
            "loan": "no", 
            "contact": "unknown", 
            "day": 21, 
            "month": "may", 
            "duration": 1106, 
            "campaign": 1, 
            "pdays": -1, 
            "previous": 0, 
            "poutcome": 
            "unknown", 
            "y": "no"}',
            @bank_model, 
            JSON_OBJECT('prediction_explainer', 'permutation_importance'));
  • Examples for ML_PREDICT_ROW and ML_EXPLAIN_ROW that insert data directly with a JSON object, and avoid the FROM clause.

    mysql> SET @row_input = JSON_OBJECT(
            'age', 37,
            'job', 'admin.',
            'marital', 'married',
            'education', 'unknown',
            'default1', 'no',
            'balance', 734,
            'housing', 'yes',
            'loan', 'no',
            'contact', 'unknown',
            'day', 21,
            'month', 'may',
            'duration', 1106,
            'campaign', 1,
            'pdays', -1,
            'previous', 0,
            'poutcome', 'unknown',
            'y', 'no');
    
    mysql> SELECT sys.ML_PREDICT_ROW(@row_input, @bank_model, NULL);
    
    mysql> SELECT sys.ML_EXPLAIN_ROW(@row_input, @bank_model, 
            JSON_OBJECT('prediction_explainer', 'permutation_importance'));
  • Examples for ML_PREDICT_ROW and ML_EXPLAIN_ROW that copies four rows to an InnoDB table, and then uses a FROM clause.

    mysql> ALTER TABLE bank_marketing_lakehouse_test SECONDARY_LOAD;
    
    mysql> CREATE TABLE bank_marketing_lakehouse_test_innoDB
            AS SELECT * from bank_marketing_lakehouse_test LIMIT 4;
    
    mysql> SET @row_input = JSON_OBJECT(
            'age', bank_marketing_lakehouse_test_innoDB.age,
            'job', bank_marketing_lakehouse_test_innoDB.job,
            'marital', bank_marketing_lakehouse_test_innoDB.marital,
            'education', bank_marketing_lakehouse_test_innoDB.education,
            'default1', bank_marketing_lakehouse_test_innoDB.default1,
            'balance', bank_marketing_lakehouse_test_innoDB.balance,
            'housing', bank_marketing_lakehouse_test_innoDB.housing,
            'loan', bank_marketing_lakehouse_test_innoDB.loan,
            'contact', bank_marketing_lakehouse_test_innoDB.contact,
            'day', bank_marketing_lakehouse_test_innoDB.day,
            'month', bank_marketing_lakehouse_test_innoDB.month,
            'duration', bank_marketing_lakehouse_test_innoDB.duration,
            'campaign', bank_marketing_lakehouse_test_innoDB.campaign,
            'pdays', bank_marketing_lakehouse_test_innoDB.pdays,
            'previous', bank_marketing_lakehouse_test_innoDB.previous,
            'poutcome', bank_marketing_lakehouse_test_innoDB.poutcome);
    
    mysql> SELECT sys.ML_PREDICT_ROW(@row_input, @bank_model, NULL);
            FROM bank_marketing_lakehouse_test_innoDB LIMIT 4;
    
    mysql> SELECT sys.ML_EXPLAIN_ROW(@row_input, @bank_model,
            JSON_OBJECT('prediction_explainer', 'permutation_importance'))
            FROM bank_marketing_lakehouse_test_innoDB LIMIT 4;