As of MySQL 8.0.31, after the
ML_TRAIN routine, use the
ML_EXPLAIN routine to train
prediction explainers and model explainers for HeatWave AutoML. In
earlier releases, the
routine trains the default Permutation Importance model and
Explanations help you understand which features have the most influence on a prediction. Feature importance is presented as a value ranging from -1 to 1. A positive value indicates that a feature contributed toward the prediction. A negative value indicates that the feature contributed toward a different prediction; for example, if a feature in a loan approval model with two possible predictions ('approve' and 'reject') has a negative value for an 'approve' prediction, that feature would have a positive value for a 'reject' prediction. A value of 0 or near 0 indicates that the feature value has no impact on the prediction to which it applies.
Prediction explainers are used when you run the
ML_EXPLAIN_TABLE routines to
generate explanations for specific predictions. You must train a
prediction explainer for the model before you can use those
can train these prediction explainers:
The Permutation Importance prediction explainer, specified as
permutation_importance, is the default prediction explainer, which explains the prediction for a single row or table.
The SHAP prediction explainer, specified as
shap, uses feature importance values to explain the prediction for a single row or table.
Model explainers are used when you run the
ML_EXPLAIN routine to explain
what the model learned from the training dataset. The model
explainer provides a list of feature importances to show what
features the model considered important based on the entire
training dataset. The
routine can train these model explainers:
The Partial Dependence model explainer, specified as
partial_dependence, shows how changing the values of one or more columns will change the value that the model predicts. When you train this model explainer, you need to specify some additional options.
The SHAP model explainer, specified as
shap, produces global feature importance values based on Shapley values.
The Fast SHAP model explainer, specified as
fast_shap, is a subsampling version of the SHAP model explainer which usually has a faster runtime.
The Permutation Importance model explainer, specified as
permutation_importance, is the default model explainer.
The model explanation is stored in the model catalog along with
the machine learning model (see
Section 3.12.1, “The Model Catalog”). If you run
ML_EXPLAIN again for the same
model handle and model explainer, the field is overwritten with
the new result.
Before you run
must load the model, for example:
mysql> CALL sys.ML_MODEL_LOAD('ml_data.iris_train_user1_1636729526', NULL);
The following example runs
ML_EXPLAIN to train the SHAP
model explainer and the Permutation Importance prediction
explainer for the model:
mysql> CALL sys.ML_EXPLAIN('ml_data.iris_train', 'class', 'ml_data.iris_train_user1_1636729526', JSON_OBJECT('model_explainer', 'shap', 'prediction_explainer', 'permutation_importance'));
ml_data.iris_trainis the fully qualified name of the table that contains the training dataset (
classis the name of the target column, which contains ground truth values.
ml_data.iris_train_is the model handle for the model in the model catalog. You can use a session variable to specify the model handle instead, written as
JSONis a list of key-value pairs naming the model explainer and prediction explainer that are to be trained for the model. In this case,
shapfor the SHAP model explainer, and
permutation_importancefor the Permutation Importance model explainer.
This example runs
train the Partial Dependence model explainer (which requires
extra options) and the SHAP prediction explainer for the model:
mysql> CALL sys.ML_EXPLAIN('ml_data.iris_train', 'class', @iris_model, JSON_OBJECT('columns_to_explain', JSON_ARRAY('sepal width'), 'target_value', 'Iris-setosa', 'model_explainer', 'partial_dependence', 'prediction_explainer', 'shap'));
sepal widthcolumn for the explainer to explain how changing the value in this column affects the model. You can identify more than one column in the JSON array.
target_valueis a valid value that the target column containing ground truth values (in this case,
class) can take.