Documentation Home
HeatWave User Guide
Related Documentation Download this Manual
PDF (US Ltr) - 2.1Mb
PDF (A4) - 2.1Mb


3.16.9 ML_SCORE

ML_SCORE scores a model by generating predictions using the feature columns in a labeled dataset as input and comparing the predictions to ground truth values in the target column of the labeled dataset. The dataset used with ML_SCORE should have the same feature columns as the dataset used to train the model but the data should be different; for example, you might reserve 20 to 30 percent of the labeled training data for scoring.

ML_SCORE returns a computed metric indicating the quality of the model.

ML_SCORE Syntax

mysql> CALL sys.ML_SCORE(table_name, target_column_name, model_handle, metric, score, [options]);

options: {
     JSON_OBJECT("key","value"[,"key","value"] ...)
          "key","value": {
          ['threshold', 'N']
          ['topk', 'N']
          ['remove_seen', {'true'|'false'}]
          }
}

ML_SCORE parameters:

  • table_name: Specifies the fully qualified name of the table used to compute model quality (schema_name.table_name). The table must contain the same columns as the training dataset.

  • target_column_name: Specifies the name of the target column containing ground truth values.

    Forecasting does not require target_column_name, and it can be set to NULL.

  • model_handle: Specifies the model handle or a session variable containing the model handle.

  • metric: Specifies the name of the metric. See Section 3.16.14, “Optimization and Scoring Metrics”.

  • score: Specifies the user-defined variable name for the computed score. The ML_SCORE routine populates the variable. User variables are written as @var_name. The examples in this guide use @score as the variable name. Any valid name for a user-defined variable is permitted, for example @my_score.

  • options: A set of options in JSON format. This parameter only supports the anomaly detection and recommendation tasks. For all other tasks, set this parameter to NULL.

    • threshold: The optional threshold for use with the anomaly_detection and recommendation tasks.

      Use with the anomaly_detection task to convert anomaly scores to 1: an anomaly or 0: normal. 0 < threshold < 1. The default value is (1 - contamination)-th percentile of all the anomaly scores.

      Use with the recommendation task and ranking metrics to define positive feedback, and a relevant sample. All rankings at or above the threshold are implied to provide positive feedback. All rankings below the threshold are implied to provide negative feedback. The default value is 1.

    • topk: The optional top K rows for use with the anomaly_detection and recommendation tasks. A positive integer between 1 and the table length.

      For the anomaly_detection task the results include the top K rows with the highest anomaly scores. It is an integer between 1 and the table length. If topk is not set, ML_SCORE uses threshold.

      For an anomaly_detection task, do not set both threshold and topk. Use threshold or topk, or set options to NULL.

      For the recommendation task and ranking metrics, the number of recommendations to provide. The default is 3.

      A recommendation task and ranking metrics can use both threshold and topk.

    • remove_seen: If the input table overlaps with the training table, and remove_seen is true, then the model will not repeat existing interactions. The default is true. Set remove_seen to false to repeat existing interactions from the training table.

Syntax Example

  • The following example runs ML_SCORE on the ml_data.iris_train table to determine model quality:

    mysql> CALL sys.ML_SCORE('ml_data.iris_validate', 'class', @iris_model, 
              'balanced_accuracy', @score, NULL);
    
    mysql> SELECT @score;
    +--------------------+
    | @score             |
    +--------------------+
    | 0.9583333134651184 |
    +--------------------+

See also: