After the ML_TRAIN
routine, use
the ML_EXPLAIN
routine to train
prediction explainers and model explainers for HeatWave AutoML. In
earlier releases, the ML_TRAIN
routine trains the default Permutation Importance model and
prediction explainers.
Explanations help you understand which features have the most influence on a prediction. Feature importance is presented as a value ranging from -1 to 1. A positive value indicates that a feature contributed toward the prediction. A negative value indicates that the feature contributed toward a different prediction; for example, if a feature in a loan approval model with two possible predictions ('approve' and 'reject') has a negative value for an 'approve' prediction, that feature would have a positive value for a 'reject' prediction. A value of 0 or near 0 indicates that the feature value has no impact on the prediction to which it applies.
Prediction explainers are used when you run the
ML_EXPLAIN_ROW
and
ML_EXPLAIN_TABLE
routines to
generate explanations for specific predictions. You must train a
prediction explainer for the model before you can use those
routines. The ML_EXPLAIN
routine
can train these prediction explainers:
The Permutation Importance prediction explainer, specified as
permutation_importance
, is the default prediction explainer, which explains the prediction for a single row or table.The SHAP prediction explainer, specified as
shap
, uses feature importance values to explain the prediction for a single row or table.
Model explainers are used when you run the
ML_EXPLAIN
routine to explain
what the model learned from the training dataset. The model
explainer provides a list of feature importances to show what
features the model considered important based on the entire
training dataset. The ML_EXPLAIN
routine can train these model explainers:
The Partial Dependence model explainer, specified as
partial_dependence
, shows how changing the values of one or more columns will change the value that the model predicts. When you train this model explainer, you need to specify some additional options.The SHAP model explainer, specified as
shap
, produces global feature importance values based on Shapley values.The Fast SHAP model explainer, specified as
fast_shap
, is a subsampling version of the SHAP model explainer which usually has a faster runtime.The Permutation Importance model explainer, specified as
permutation_importance
, is the default model explainer.
The model explanation is stored in the model catalog along with
the machine learning model (see
Section 3.14.1, “The Model Catalog”). If you run
ML_EXPLAIN
again for the same
model handle and model explainer, the field is overwritten with
the new result.
Before you run ML_EXPLAIN
, you
must load the model, for example:
mysql> CALL sys.ML_MODEL_LOAD('ml_data.iris_train_user1_1636729526', NULL);
The following example runs
ML_EXPLAIN
to train the SHAP
model explainer and the Permutation Importance prediction
explainer for the model:
mysql> CALL sys.ML_EXPLAIN('ml_data.iris_train', 'class', 'ml_data.iris_train_user1_1636729526',
JSON_OBJECT('model_explainer', 'shap', 'prediction_explainer', 'permutation_importance'));
Where:
ml_data.iris_train
is the fully qualified name of the table that contains the training dataset (schema_name.table_name
).class
is the name of the target column, which contains ground truth values.ml_data.iris_train_
is the model handle for the model in the model catalog. You can use a session variable to specify the model handle instead, written asuser1
_1636729526@
.var_name
JSON
is a list of key-value pairs naming the model explainer and prediction explainer that are to be trained for the model. In this case,model_explainer
specifiesshap
for the SHAP model explainer, andprediction_explainer
specifiespermutation_importance
for the Permutation Importance model explainer.
This example runs ML_EXPLAIN
to
train the Partial Dependence model explainer (which requires
extra options) and the SHAP prediction explainer for the model:
mysql> CALL sys.ML_EXPLAIN('ml_data.iris_train', 'class', @iris_model,
JSON_OBJECT('columns_to_explain', JSON_ARRAY('sepal width'),
'target_value', 'Iris-setosa', 'model_explainer',
'partial_dependence', 'prediction_explainer', 'shap'));
Where:
columns_to_explain
identifies thesepal width
column for the explainer to explain how changing the value in this column affects the model. You can identify more than one column in the JSON array.target_value
is a valid value that the target column containing ground truth values (in this case,class
) can take.
For the full ML_EXPLAIN
option
descriptions, see Section 3.16.2, “ML_EXPLAIN”.