MySQL HeatWave User Guide  /  HeatWave ML  /  Explanations

3.7 Explanations

Explanations are generated by running ML_EXPLAIN_ROW or ML_EXPLAIN_TABLE on unlabeled data; that is, it must have the same feature columns as the data used to train the model but no target column.

Explanations help you understand which features have the most influence on a prediction. Feature importance is presented as a value ranging from -1 to 1. A positive value indicates that a feature contributed toward the prediction. A negative value indicates that the feature contributed toward a different prediction; for example, if a feature in a loan approval model with two possible predictions ('approve' and 'reject') has a negative value for an 'approve' prediction, that feature would have a positive value for a 'reject' prediction. A value of 0 or near 0 indicates that the feature value has no impact on the prediction to which it applies.

ML_EXPLAIN_ROW generates explanations for one or more rows of data. ML_EXPLAIN_TABLE generates explanations on an entire table of data and saves the results to an output table. ML_EXPLAIN_* routines limit explanations to the 100 most relevant features.

From MySQL HeatWave 8.0.31, after you use the ML_TRAIN routine, use the ML_EXPLAIN routine to train prediction explainers and model explainers for HeatWave ML. You must train prediction explainers in order to use ML_EXPLAIN_ROW and ML_EXPLAIN_TABLE. In earlier releases, the ML_TRAIN routine trains the default Permutation Importance model and prediction explainers. See Section 3.5, “Training Explainers”.

Row Explanations

ML_EXPLAIN_ROW explains predictions for one or more rows of unlabeled data. It is invoked using a SELECT statement. For ML_EXPLAIN_ROW parameter descriptions, see Section 3.10.6, “ML_EXPLAIN_ROW”.

Before running ML_EXPLAIN_ROW, ensure that the model you want to use is loaded; for example:

CALL sys.ML_MODEL_LOAD(@census_model, NULL);

For more information about loading models, see Section 3.9.3, “Loading Models”.

The following example generates explanations for a single row of unlabeled data, which is assigned to a @row_input session variable:

SET @row_input = JSON_OBJECT( 
"age", 25, 
"workclass", "Private", 
"fnlwgt", 226802, 
"education", "11th", 
"education-num", 7, 
"marital-status", "Never-married", 
"occupation", "Machine-op-inspct", 
"relationship", "Own-child", 
"race", "Black", 
"sex", "Male", 
"capital-gain", 0, 
"capital-loss", 0, 
"hours-per-week", 40, 
"native-country", "United-States");

SELECT sys.ML_EXPLAIN_ROW(@row_input, @census_model, JSON_OBJECT(
'prediction_explainer', 'permutation_importance')));

where:

  • @row_input is a session variable containing a row of unlabeled data. The data is specified in JSON key-value format. The column names must match the feature column names in the training dataset.

  • @census_model is the session variable that contains the model handle.

  • prediction_explainer provides the name of the prediction explainer that you have trained for this model, either the Permutation Importance prediction explainer or the SHAP prediction explainer. You train this using the ML_EXPLAIN routine (see Section 3.5, “Training Explainers”).

ML_EXPLAIN_ROW output includes a prediction, the features used to make the prediction, and a weighted numerical value that indicates feature importance, in the following format: ("feature_attribution": value). From MySQL 8.0.30, output includes a Notes field that identifies features with the greatest impact on predictions and reports a warning if the model is low quality.

You can also run ML_EXPLAIN_ROW on multiple rows of data selected from a table. For an example, refer to the syntax examples in Section 3.10.6, “ML_EXPLAIN_ROW”.

Table Explanations

ML_EXPLAIN_TABLE explains predictions for an entire table of unlabeled data and saves results to an output table. Explanations are performed in parallel. For ML_EXPLAIN_TABLE parameter descriptions, see Section 3.10.7, “ML_EXPLAIN_TABLE”.

ML_EXPLAIN_TABLE is a compute intensive process. Limiting operations to batches of 10 to 100 rows by splitting large tables into smaller tables is recommended.

The following example creates a table with 10 rows of data selected from the census_test dataset and generates explanations for that table.

Before running ML_EXPLAIN_TABLE, ensure that the model you want to use is loaded; for example:

CALL sys.ML_MODEL_LOAD(@census_model, NULL);

For more information about loading models, see Section 3.9.3, “Loading Models”.

The following example creates a table with 10 rows of unlabeled test data and generates explanations for that table:

CREATE TABLE heatwaveml_bench.census_test_subset AS SELECT * FROM heatwaveml_bench.census_test 
LIMIT 10;  
  
CALL sys.ML_EXPLAIN_TABLE('heatwaveml_bench.census_test_subset', @census_model, 
'heatwaveml_bench.census_explanations', JSON_OBJECT('prediction_explainer', 'shap'));

where:

  • heatwaveml_bench.census_test_subset is the fully qualified name of the input table (schema_name.table_name). The table must have the same feature column names as the training dataset but no target column.

  • @census_model is the session variable that contains the model handle.

  • heatwaveml_bench.census_explanations is the table where explanation data is stored. The table is created if it does not exist. A fully qualified table name must be specified (schema_name.table_name). If the table already exists, an error is returned.

  • prediction_explainer provides the name of the prediction explainer that you have trained for this model, either the Permutation Importance prediction explainer or the SHAP prediction explainer. You train this using the ML_EXPLAIN routine (see Section 3.5, “Training Explainers”).

To view ML_EXPLAIN_TABLE results, query the output table; for example:

SELECT * FROM heatwaveml_bench.census_explanations;

The ML_EXPLAIN_TABLE output table includes the features used to make the explanations, the explanations, and feature_attribution columns that provide a weighted numerical value that indicates feature importance.

From MySQL 8.0.30, ML_EXPLAIN_TABLE output also includes a Notes field that identifies features with the greatest impact on predictions. ML_EXPLAIN_TABLE reports a warning if the model is low quality.