Related Documentation Download this Manual
PDF (US Ltr) - 1.6Mb
PDF (A4) - 1.6Mb


MySQL HeatWave User Guide  /  ...  /  Optimization and Scoring Metrics

3.15.13 Optimization and Scoring Metrics

The ML_TRAIN routine includes the optimization_metric option, and the ML_SCORE routine includes the metric option. Both of these options define a metric that must be compatible with the task type and the target data. Section 3.15.12, “Model Metadata” includes the optimization_metric field.

For more information about scoring metrics, see: scikit-learn.org. For more information about forecasting metrics, see: sktime.org and statsmodels.org.

  • Classification metrics

  • Regression metrics

  • Forecasting metrics

  • Anomaly detection metrics

    ML_SCORE only. Not supported for ML_TRAIN.

    • No threshold or topk options.

      Do not specify threshold and topk options.

    • threshold option.

      Uses the threshold option. Do not specify the topk option.

    • topk option.

      Requires the topk option. Do not specify the threshold option.

      • precision_k is an Oracle implementation of a common metric for fraud detection and lead scoring.

  • Recommendation model metrics

    • Rating metrics to use with recommendation models that use explicit feedback.

      Use with ML_SCORE and ML_TRAIN.

    • Ranking metrics to use with recommendation models that use implicit feedback.

      ML_SCORE only. Not supported for ML_TRAIN.

      If a user and item combination in the input table is not unique the input table is grouped by user and item columns, and the result is the average of the rankings.

      If the input table overlaps with the training table, and remove_seen is true, which is the default setting, then the model will not repeat a recommendation and it ignores the overlap items.

      • precision_at_k is the number of relevant topk recommended items divided by the total topk recommended items for a particular user:

        precision_at_k = (relevant topk recommended items) / (total topk recommended items)

        For example, if 7 out of 10 items are relevant for a user, and topk is 10, then precision_at_k is 70%.

        The precision_at_k value for the input table is the average for all users. If remove_seen is true, the default setting, then the average only includes users for whom the model can make a recommendation. If a user has implicitly ranked every item in the training table the model cannot recommend any more items for that user, and they are ignored from the average calculation if remove_seen is true.

      • recall_at_k is the number of relevant topk recommended items divided by the total relevant items for a particular user:

        recall_at_k = (relevant topk recommended items) / (total relevant items)

        For example, there is a total of 20 relevant items for a user. If topk is 10, and 7 of those items are relevant, then recall_at_k is 7 / 20 = 35%.

        The recall_at_k value for the input table is the average for all users.

      • hit_ratio_at_k is the number of relevant topk recommended items divided by the total relevant items for all users:

        hit_ratio_at_k = (relevant topk recommended items, all users) / (total relevant items, all users)

        The average of hit_ratio_at_k for the input table is recall_at_k. If there is only one user, hit_ratio_at_k is the same as recall_at_k.

      • ndcg_at_k is normalized discounted cumulative gain, which is the discounted cumulative gain of the relevant topk recommended items divided by the discounted cumulative gain of the relevant topk items for a particular user.

        The discounted gain of an item is the true rating divided by log2(r+1) where r is the ranking of this item in the relevant topk items. If a user prefers a particular item, the rating is higher, and the ranking is lower.

        The ndcg_at_k value for the input table is the average for all users.