HeatWave AutoML routines can load data directly from Object Storage with Lakehouse. Lakehouse must be enabled. See Chapter 5, HeatWave Lakehouse.
ML_TRAIN
,
ML_MODEL_LOAD
,
ML_EXPLAIN
ML_PREDICT_TABLE
,
ML_EXPLAIN_TABLE
, and
ML_SCORE
routines require no
changes.
ML_PREDICT_ROW
and
ML_EXPLAIN_ROW
routines cannot
use a FROM
clause.
Loading data from Lakehouse into HeatWave and unloading:
If the Lakehouse table had not been loaded into HeatWave before a HeatWave AutoML command, then the data will be unloaded after the command.
If the Lakehouse table had been loaded into HeatWave before a HeatWave AutoML command, then the data will remain in HeatWave after the command.
HeatWave AutoML commands operate on data loaded into HeatWave. If the original Lakehouse data in Object Storage is deleted or modified this will not affect a HeatWave AutoML command, until the data is unloaded from HeatWave.
The following examples use data from:
Bank
Marketing. The target column is y
.
-
A
CREATE TABLE
example with Lakehouse details that loads the training dataset. See: Section 5.2.2, “Lakehouse External Table Syntax”.mysql> CREATE TABLE bank_marketing_lakehouse_train( age int, job varchar(255), marital varchar(255), education varchar(255), default1 varchar(255), balance float, housing varchar(255), loan varchar(255), contact varchar(255), day int, month varchar(255), duration float, campaign int, pdays float, previous float, poutcome varchar(255), y varchar(255) ) ENGINE=LAKEHOUSE SECONDARY_ENGINE=RAPID ENGINE_ATTRIBUTE='{"dialect": {"format": "csv", "skip_rows": 1, "field_delimiter":",", "record_delimiter":"\\n"}}', "file": [{ "region": "region", "namespace": "namespace", "bucket": "bucket", "prefix": "mlbench/bank_marketing_train.csv"}]'
-
An
ALTER TABLE
example with Lakehouse details that loads the test dataset.mysql> CREATE TABLE bank_marketing_lakehouse_test( age int, job varchar(255), marital varchar(255), education varchar(255), default1 varchar(255), balance float, housing varchar(255), loan varchar(255), contact varchar(255), day int, month varchar(255), duration float, campaign int, pdays float, previous float, poutcome varchar(255), y varchar(255) ); mysql> ALTER TABLE bank_marketing_lakehouse_test ENGINE=LAKEHOUSE SECONDARY_ENGINE=RAPID ENGINE_ATTRIBUTE='{"dialect": {"format": "csv", "skip_rows": 1, "field_delimiter":",", "record_delimiter":"\\n"}}', "file": [{ "region": "region", "namespace": "namespace", "bucket": "bucket", "prefix": "mlbench/bank_marketing_test.csv"}]';
-
ML_TRAIN
,ML_MODEL_LOAD
, andML_SCORE
examples that use the Lakehouse data.mysql> CALL sys.ML_TRAIN('ml_data.bank_marketing_lakehouse_train', 'y', NULL, @bank_model); mysql> CALL sys.ML_MODEL_LOAD(@bank_model, NULL); mysql> CALL sys.ML_SCORE('ml_data.bank_marketing_lakehouse_test', 'y', @bank_model, 'balanced_accuracy', @score, NULL);
-
ML_PREDICT_TABLE
,ML_EXPLAIN
, andML_EXPLAIN_TABLE
examples that use the Lakehouse data.ML_EXPLAIN_TABLE
might take a long time for a large data set, but this is shape dependent.mysql> CALL sys.ML_PREDICT_TABLE('ml_data.bank_marketing_lakehouse_test', @bank_model, 'ml_data.bank_marketing_lakehouse_test_predictions', NULL); mysql> CALL sys.ML_EXPLAIN('ml_data.bank_marketing_lakehouse_test', 'y', @bank_model, JSON_OBJECT('prediction_explainer', 'permutation_importance')); mysql> CALL sys.ML_EXPLAIN_TABLE('ml_data.bank_marketing_lakehouse_test', @bank_model, 'ml_data.bank_marketing_lakehouse_test_explanations', JSON_OBJECT('prediction_explainer', 'permutation_importance'));
-
Examples for
ML_PREDICT_ROW
andML_EXPLAIN_ROW
that insert data directly, and avoid theFROM
clause.mysql> SELECT sys.ML_PREDICT_ROW('{ "age": 37, "job": "admin.", "marital": "married", "education": "unknown", "default1": "no", "balance": 734, "housing": "yes", "loan": "no", "contact": "unknown", "day": 21, "month": "may", "duration": 1106, "campaign": 1, "pdays": -1, "previous": 0, "poutcome": "unknown", "y": "no"}', @bank_model, NULL); mysql> SELECT sys.ML_EXPLAIN_ROW('{ "age": 37, "job": "admin.", "marital": "married", "education": "unknown", "default1": "no", "balance": 734, "housing": "yes", "loan": "no", "contact": "unknown", "day": 21, "month": "may", "duration": 1106, "campaign": 1, "pdays": -1, "previous": 0, "poutcome": "unknown", "y": "no"}', @bank_model, JSON_OBJECT('prediction_explainer', 'permutation_importance'));
-
Examples for
ML_PREDICT_ROW
andML_EXPLAIN_ROW
that insert data directly with aJSON
object, and avoid theFROM
clause.mysql> SET @row_input = JSON_OBJECT( 'age', 37, 'job', 'admin.', 'marital', 'married', 'education', 'unknown', 'default1', 'no', 'balance', 734, 'housing', 'yes', 'loan', 'no', 'contact', 'unknown', 'day', 21, 'month', 'may', 'duration', 1106, 'campaign', 1, 'pdays', -1, 'previous', 0, 'poutcome', 'unknown', 'y', 'no'); mysql> SELECT sys.ML_PREDICT_ROW(@row_input, @bank_model, NULL); mysql> SELECT sys.ML_EXPLAIN_ROW(@row_input, @bank_model, JSON_OBJECT('prediction_explainer', 'permutation_importance'));
-
Examples for
ML_PREDICT_ROW
andML_EXPLAIN_ROW
that copies four rows to an InnoDB table, and then uses aFROM
clause.mysql> ALTER TABLE bank_marketing_lakehouse_test SECONDARY_LOAD; mysql> CREATE TABLE bank_marketing_lakehouse_test_innoDB AS SELECT * from bank_marketing_lakehouse_test LIMIT 4; mysql> SET @row_input = JSON_OBJECT( 'age', bank_marketing_lakehouse_test_innoDB.age, 'job', bank_marketing_lakehouse_test_innoDB.job, 'marital', bank_marketing_lakehouse_test_innoDB.marital, 'education', bank_marketing_lakehouse_test_innoDB.education, 'default1', bank_marketing_lakehouse_test_innoDB.default1, 'balance', bank_marketing_lakehouse_test_innoDB.balance, 'housing', bank_marketing_lakehouse_test_innoDB.housing, 'loan', bank_marketing_lakehouse_test_innoDB.loan, 'contact', bank_marketing_lakehouse_test_innoDB.contact, 'day', bank_marketing_lakehouse_test_innoDB.day, 'month', bank_marketing_lakehouse_test_innoDB.month, 'duration', bank_marketing_lakehouse_test_innoDB.duration, 'campaign', bank_marketing_lakehouse_test_innoDB.campaign, 'pdays', bank_marketing_lakehouse_test_innoDB.pdays, 'previous', bank_marketing_lakehouse_test_innoDB.previous, 'poutcome', bank_marketing_lakehouse_test_innoDB.poutcome); mysql> SELECT sys.ML_PREDICT_ROW(@row_input, @bank_model, NULL); FROM bank_marketing_lakehouse_test_innoDB LIMIT 4; mysql> SELECT sys.ML_EXPLAIN_ROW(@row_input, @bank_model, JSON_OBJECT('prediction_explainer', 'permutation_importance')) FROM bank_marketing_lakehouse_test_innoDB LIMIT 4;