Documentation Home
MySQL HeatWave User Guide
Related Documentation Download this Manual
PDF (US Ltr) - 1.0Mb
PDF (A4) - 1.0Mb


MySQL HeatWave User Guide  /  HeatWave ML  /  Explanations

3.6 Explanations

Explanations are generated by running ML_EXPLAIN_ROW or ML_EXPLAIN_TABLE on unlabeled data; that is, it must have the same feature columns as the data used to train the model but no target column.

ML_EXPLAIN_ROW generates explanations for one or more rows of data. ML_EXPLAIN_TABLE generates explanations on an entire table of data and saves the results to an output table.

Explanations help you understand which features have the most influence on a prediction. Feature importance is presented as a value ranging from -1 to 1. A positive value indicates that a feature contributed toward the prediction. A negative value indicates that the feature contributed toward a different prediction; for example, if a feature in a loan approval model with two possible predictions ('approve' and 'reject') has a negative value for an 'approve' prediction, that feature would have a positive value for a 'reject' prediction. A value of 0 or near 0 indicates that the feature value has no impact on the prediction to which it applies.

ML_EXPLAIN_* routines limit explanations to the 100 most relevant features.

Row Explanations

ML_EXPLAIN_ROW explains predictions for one or more rows of unlabeled data. It is invoked using a SELECT statement. For ML_EXPLAIN_ROW parameter descriptions, see Section 3.8.4, “ML_EXPLAIN_ROW”.

Before running ML_EXPLAIN_ROW, ensure that the model you want to use is loaded; for example:

CALL sys.ML_MODEL_LOAD(@census_model, NULL);

For more information about loading models, see Section 3.7.2, “Loading Models”.

The following example generates explanations for a single row of unlabeled data, which is assigned to a @row_input session variable:

SET @row_input = JSON_OBJECT( 
"age", 25, 
"workclass", "Private", 
"fnlwgt", 226802, 
"education", "11th", 
"education-num", 7, 
"marital-status", "Never-married", 
"occupation", "Machine-op-inspct", 
"relationship", "Own-child", 
"race", "Black", 
"sex", "Male", 
"capital-gain", 0, 
"capital-loss", 0, 
"hours-per-week", 40, 
"native-country", "United-States");

SELECT sys.ML_EXPLAIN_ROW(@row_input, @census_model);

where:

  • @row_input is a session variable containing a row of unlabeled data. The data is specified in JSON key-value format. The column names must match the feature column names in the training dataset.

  • @census_model is the session variable that contains the model handle.

ML_EXPLAIN_ROW output includes a prediction, the features used to make the prediction, and a weighted numerical value that indicates feature importance, in the following format: ("feature_attribution": value). From MyQL 8.0.30, output includes a Notes field that identifies features with the greatest impact on predictions and reports a warning if the model is low quality.

You can also run ML_EXPLAIN_ROW on multiple rows of data selected from a table. For an example, refer to the syntax examples in Section 3.8.4, “ML_EXPLAIN_ROW”.

Table Explanations

ML_EXPLAIN_TABLE explains predictions for an entire table of unlabeled data and saves results to an output table. Explanations are performed in parallel. For ML_EXPLAIN_TABLE parameter descriptions, see Section 3.8.5, “ML_EXPLAIN_TABLE”.

ML_EXPLAIN_TABLE is a compute intensive process. Limiting operations to batches of 10 to 100 rows by splitting large tables into smaller tables is recommended.

The following example creates a table with 10 rows of data selected from the census_test dataset and generates explanations for that table.

Before running ML_EXPLAIN_TABLE, ensure that the model you want to use is loaded; for example:

CALL sys.ML_MODEL_LOAD(@census_model, NULL);

For more information about loading models, see Section 3.7.2, “Loading Models”.

The following example creates a table with 10 rows of unlabeled test data and generates explanations for that table:

CREATE TABLE heatwaveml_bench.census_test_subset AS SELECT * FROM heatwaveml_bench.census_test 
LIMIT 10;  
  
CALL sys.ML_EXPLAIN_TABLE('heatwaveml_bench.census_test_subset', @census_model, 
'heatwaveml_bench.census_explanations');

where:

  • heatwaveml_bench.census_test_subset is the fully qualified name of the input table (schema_name.table_name). The table must have the same feature column names as the training dataset but no target column.

  • @census_model is the session variable that contains the model handle.

  • heatwaveml_bench.census_explanations is the table where explanation data is stored. The table is created if it does not exist. A fully qualified table name must be specified (schema_name.table_name). If the table already exists, an error is returned.

To view ML_EXPLAIN_TABLE results, query the output table; for example:

SELECT * FROM heatwaveml_bench.census_explanations;

The ML_EXPLAIN_TABLE output table includes the features used to make the explanations, the explanations, and feature_attribution columns that provide a weighted numerical value that indicates feature importance.

From MySQL 8.0.30, ML_EXPLAIN_TABLE output also includes a Notes field that identifies features with the greatest impact on predictions. ML_EXPLAIN_TABLE reports a warning if the model is low quality.