Documentation Home
HeatWave User Guide
Related Documentation Download this Manual
PDF (US Ltr) - 3.9Mb
PDF (A4) - 3.9Mb


HeatWave User Guide  /  ...  /  Generate Prediction Explanations for a Row of Data

6.5.6.1 Generate Prediction Explanations for a Row of Data

ML_EXPLAIN_ROW explains predictions for one or more rows of unlabeled data. You invoke the routine by using a SELECT statement.

This topic has the following sections.

Before You Begin
Unsupported Model Types

You cannot generate prediction explanations on a row of data for the following model types:

  • Forecasting

  • Recommendation

  • Anomaly detection

  • Anomaly detection for logs

  • Topic modeling

Prepare to Generate a Row Explanation

Before running ML_EXPLAIN_ROW, you must train, and then load the model you want to use.

  1. The following example trains a dataset with the classification machine learning task.

    mysql> CALL sys.ML_TRAIN('census_data.census_train', 'revenue', JSON_OBJECT('task', 'classification'), @census_model);
  2. The following example loads the trained model.

    mysql> CALL sys.ML_MODEL_LOAD(@census_model, NULL);

For more information about training and loading models, see Train a Model and Load a Model.

After training and loading the model, you can generate prediction explanations for one or more rows. For parameter and option descriptions, see ML_EXPLAIN_ROW.

Generate a Row Prediction Explanation with the Default Permutation Importance Explainer

After training and loading a model, you can run ML_EXPLAIN_ROW to generate a row prediction explanation with the default Permutation Importance explainer. However, if you train the shap prediction explainer with ML_EXPLAIN, you need to run ML_EXPLAIN again with the permutation_importance explainer before running ML_EXPLAIN_ROW with the same explainer.

The following example enters a row of data to explain into a session variable. The session variable is then used in the ML_EXPLAIN_ROW routine.

  1. Define values for each column to predict. The column names must match the feature column names in the trained table.

    mysql> SET @variable = (JSON_OBJECT("column_name", value, "column_name", value, ...), model_handle, options);

    In the following example, assign the data to analyze into the @row_input session variable.

    mysql> SET @row_input = JSON_OBJECT( 
              "age", 31, 
              "workclass", "Private", 
              "fnlwgt", 45781, 
              "education", "Masters", 
              "education-num", 14, 
              "marital-status", "Married-civ-spouse", 
              "occupation", "Prof-specialty", 
              "relationship", "Not-in-family", 
              "race", "White", 
              "sex", "Female", 
              "capital-gain", 14084, 
              "capital-loss", 2042, 
              "hours-per-week", 40, 
              "native-country", "India");
  2. Run the ML_EXPLAIN_ROW routine.

    mysql> SELECT sys.ML_EXPLAIN_ROW(input_data, model_handle, [options]);

    In the following example, include the session variable previously created. Optionally, use \G to display the output in an easily readable format. The output is similar to the following:

    mysql> SELECT sys.ML_EXPLAIN_ROW(@row_input, @census_model, JSON_OBJECT('prediction_explainer', 'permutation_importance'))\G
    *************************** 1. row ***************************
    sys.ML_EXPLAIN_ROW(@row_input, @census_model, 
              JSON_OBJECT('prediction_explainer', 'permutation_importance')): 
              {
                    "age": 31,
                    "sex": "Female",
                    "race": "White",
                    "Notes": "capital-gain (14084) had the largest impact towards predicting >50K",
                    "fnlwgt": 45781,
                    "education": "Masters",
                    "workclass": "Private",
                    "Prediction": ">50K",
                    "ml_results": {
                       "notes": "capital-gain (14084) had the largest impact towards predicting >50K",
                       "predictions": {
                           "revenue": ">50K"
                        },
                        "attributions": {
                            "age": 0.34,
                            "sex": 0,
                            "race": 0,
                            "fnlwgt": 0,
                            "education": 0,
                            "workclass": 0,
                            "occupation": 0,
                            "capital-gain": 0.97,
                            "capital-loss": 0,
                            "relationship": 0,
                            "education-num": 0.04,
                            "hours-per-week": 0,
                            "marital-status": 0
                        }
                    },
                    "occupation": "Prof-specialty",
                    "capital-gain": 14084,
                    "capital-loss": 2042,
                    "relationship": "Not-in-family",
                    "education-num": 14,
                    "hours-per-week": 40,
                    "marital-status": "Married-civ-spouse",
                    "native-country": "India",
                    "age_attribution": 0.34,
                    "sex_attribution": 0,
                    "race_attribution": 0,
                    "fnlwgt_attribution": 0,
                    "education_attribution": 0,
                    "workclass_attribution": 0,
                    "occupation_attribution": 0,
                    "capital-gain_attribution": 0.97,
                    "capital-loss_attribution": 0,
                    "relationship_attribution": 0,
                    "education-num_attribution": 0.04,
                    "hours-per-week_attribution": 0,
                    "marital-status_attribution": 0
              }
    1 row in set (6.3072 sec)

The output provides an explanation on the column that had the largest impact towards the prediction, and the column that contributed the most against the prediction.

Generate a Row Prediction Explanation with the SHAP Explainer

To generate a row prediction explanation with the SHAP explainer, you must first run the SHAP explainer with ML_EXPLAIN.

  1. Run the ML_EXPLAIN routine.

    mysql> CALL sys.ML_EXPLAIN ('table_name', 'target_column_name', model_handle, [options]);

    The following example runs the shap explainer.

    mysql> CALL sys.ML_EXPLAIN('census_data.census_train', 'revenue', @census_model, JSON_OBJECT('prediction_explainer', 'shap'));

    Where:

    • census_data.census_train is the fully qualified name of the table that contains the training dataset (schema_name.table_name).

    • revenue is the name of the target column, which contains ground truth values.

    • @census_model is the session variable for the trained model.

    • prediction_explainer is set to shap for the SHAP prediction explainer.

  2. Define values for each column to predict. The column names must match the feature column names in the trained table.

    mysql> SET @variable = (JSON_OBJECT("column_name", value, "column_name", value, ...), model_handle, options);

    In the following example, assign the data to analyze into the @row_input session variable.

    mysql> SET @row_input = JSON_OBJECT( 
              "age", 25, 
              "workclass", "Private", 
              "fnlwgt", 226802, 
              "education", "11th", 
              "education-num", 7, 
              "marital-status", "Never-married", 
              "occupation", "Machine-op-inspct", 
              "relationship", "Own-child", 
              "race", "Black", 
              "sex", "Male", 
              "capital-gain", 0, 
              "capital-loss", 0, 
              "hours-per-week", 40, 
              "native-country", "United-States");
  3. Run the ML_EXPLAIN_ROW routine.

    mysql> SELECT sys.ML_EXPLAIN_ROW(input_data, model_handle, [options]);

    In the following example run the same shap prediction explainer. Optionally, use \G to display the output in an easily readable format.

    mysql> SELECT sys.ML_EXPLAIN_ROW(@row_input, @census_model, JSON_OBJECT('prediction_explainer', 'shap'))\G
    *************************** 1. row ***************************
    sys.ML_EXPLAIN_ROW(@row_input, @census_model, 
      JSON_OBJECT('prediction_explainer', 'shap')): 
      {
        "age": 25,
        "sex": "Male",
        "race": "Black",
        "fnlwgt": 226802,
        "education": "11th",
        "workclass": "Private",
        "Prediction": "<=50K",
        "ml_results": {
            "predictions": {
                "revenue": "<=50K"
            },
            "attributions": {
                "age_attribution": 0.03154012309521936,
                "sex_attribution": -0.002995059121088509,
                "race_attribution": 0.0051264089998398765,
                "fnlwgt_attribution": -0.003139455788215409,
                "education_attribution": 0.0013752672453250653,
                "workclass_attribution": 0,
                "occupation_attribution": 0.020919219303459986,
                "capital-gain_attribution": 0.015089815859614985,
                "capital-loss_attribution": 0.0033537962775555263,
                "relationship_attribution": 0.027744370891787523,
                "education-num_attribution": 0.0284122832892542,
                "hours-per-week_attribution": 0.009110644648945954,
                "marital-status_attribution": 0.036222463769272406
            }
        },
        "occupation": "Machine-op-inspct",
        "capital-gain": 0,
        "capital-loss": 0,
        "relationship": "Own-child",
        "education-num": 7,
        "hours-per-week": 40,
        "marital-status": "Never-married",
        "native-country": "United-States",
        "age_attribution": 0.0315401231,
        "sex_attribution": -0.0029950591,
        "race_attribution": 0.005126409,
        "fnlwgt_attribution": -0.0031394558,
        "education_attribution": 0.0013752672,
        "workclass_attribution": 0,
        "occupation_attribution": 0.0209192193,
        "capital-gain_attribution": 0.0150898159,
        "capital-loss_attribution": 0.0033537963,
        "relationship_attribution": 0.0277443709,
        "education-num_attribution": 0.0284122833,
        "hours-per-week_attribution": 0.0091106446,
        "marital-status_attribution": 0.0362224638
    }
    1 row in set (4.3007 sec)

    The output displays feature importance values for each column.

What's Next