MySQL HeatWave User Guide  /  HeatWave AutoML  /  Training Explainers

3.6 Training Explainers

As of MySQL 8.0.31, after the ML_TRAIN routine, use the ML_EXPLAIN routine to train prediction explainers and model explainers for HeatWave AutoML. In earlier releases, the ML_TRAIN routine trains the default Permutation Importance model and prediction explainers.

Explanations help you understand which features have the most influence on a prediction. Feature importance is presented as a value ranging from -1 to 1. A positive value indicates that a feature contributed toward the prediction. A negative value indicates that the feature contributed toward a different prediction; for example, if a feature in a loan approval model with two possible predictions ('approve' and 'reject') has a negative value for an 'approve' prediction, that feature would have a positive value for a 'reject' prediction. A value of 0 or near 0 indicates that the feature value has no impact on the prediction to which it applies.

Prediction explainers are used when you run the ML_EXPLAIN_ROW and ML_EXPLAIN_TABLE routines to generate explanations for specific predictions. You must train a prediction explainer for the model before you can use those routines. The ML_EXPLAIN routine can train these prediction explainers:

  • The Permutation Importance prediction explainer, specified as permutation_importance, is the default prediction explainer, which explains the prediction for a single row or table.

  • The SHAP prediction explainer, specified as shap, uses feature importance values to explain the prediction for a single row or table.

Model explainers are used when you run the ML_EXPLAIN routine to explain what the model learned from the training dataset. The model explainer provides a list of feature importances to show what features the model considered important based on the entire training dataset. The ML_EXPLAIN routine can train these model explainers:

  • The Partial Dependence model explainer, specified as partial_dependence, shows how changing the values of one or more columns will change the value that the model predicts. When you train this model explainer, you need to specify some additional options.

  • The SHAP model explainer, specified as shap, produces global feature importance values based on Shapley values.

  • The Fast SHAP model explainer, specified as fast_shap, is a subsampling version of the SHAP model explainer which usually has a faster runtime.

  • The Permutation Importance model explainer, specified as permutation_importance, is the default model explainer.

The model explanation is stored in the model catalog along with the machine learning model (see Section 3.10.1, “The Model Catalog”). If you run ML_EXPLAIN again for the same model handle and model explainer, the field is overwritten with the new result.

Before you run ML_EXPLAIN, you must load the model, for example:

mysql> CALL sys.ML_MODEL_LOAD('ml_data.iris_train_user1_1636729526', NULL);

The following example runs ML_EXPLAIN to train the SHAP model explainer and the Permutation Importance prediction explainer for the model:

mysql> CALL sys.ML_EXPLAIN('ml_data.iris_train', 'class', 'ml_data.iris_train_user1_1636729526', 
          JSON_OBJECT('model_explainer', 'shap', 'prediction_explainer', 'permutation_importance'));

Where:

  • ml_data.iris_train is the fully qualified name of the table that contains the training dataset (schema_name.table_name).

  • class is the name of the target column, which contains ground truth values.

  • ml_data.iris_train_user1_1636729526 is the model handle for the model in the model catalog. You can use a session variable to specify the model handle instead, written as @var_name.

  • JSON is a list of key-value pairs naming the model explainer and prediction explainer that are to be trained for the model. In this case, model_explainer specifies shap for the SHAP model explainer, and prediction_explainer specifies permutation_importance for the Permutation Importance model explainer.

This example runs ML_EXPLAIN to train the Partial Dependence model explainer (which requires extra options) and the SHAP prediction explainer for the model:

mysql> CALL sys.ML_EXPLAIN('ml_data.iris_train', 'class', @iris_model, 
          JSON_OBJECT('columns_to_explain', JSON_ARRAY('sepal width'), 
          'target_value', 'Iris-setosa', 'model_explainer', 
          'partial_dependence', 'prediction_explainer', 'shap'));

Where:

  • columns_to_explain identifies the sepal width column for the explainer to explain how changing the value in this column affects the model. You can identify more than one column in the JSON array.

  • target_value is a valid value that the target column containing ground truth values (in this case, class) can take.

For the full ML_EXPLAIN option descriptions, see Section 3.11.2, “ML_EXPLAIN”.