ML_SCORE
scores a model by
generating predictions using the feature columns in a labeled
dataset as input and comparing the predictions to ground truth
values in the target column of the labeled dataset. The
dataset used with ML_SCORE
should have the same feature columns as the dataset used to
train the model but the data should be different; for example,
you might reserve 20 to 30 percent of the labeled training
data for scoring.
ML_SCORE
returns a computed
metric indicating the quality of the model.
mysql> CALL sys.ML_SCORE(table_name, target_column_name, model_handle, metric, score, [options]);
options: {
JSON_OBJECT("key","value"[,"key","value"] ...)
"key","value": {
['threshold', 'N']
['topk', 'N']
['remove_seen', {'true'|'false'}]
}
}
ML_SCORE
parameters:
table_name
: Specifies the fully qualified name of the table used to compute model quality (schema_name.table_name
). The table must contain the same columns as the training dataset.-
target_column_name
: Specifies the name of the target column containing ground truth values.Forecasting does not require
target_column_name
, and it can be set toNULL
. model_handle
: Specifies the model handle or a session variable containing the model handle.metric
: Specifies the name of the metric. See Section 3.16.14, “Optimization and Scoring Metrics”.score
: Specifies the user-defined variable name for the computed score. TheML_SCORE
routine populates the variable. User variables are written as@
. The examples in this guide usevar_name
@score
as the variable name. Any valid name for a user-defined variable is permitted, for example@my_score
.-
options
: A set of options inJSON
format. This parameter only supports theanomaly detection
andrecommendation
tasks. For all other tasks, set this parameter toNULL
.-
threshold
: The optional threshold for use with theanomaly_detection
andrecommendation
tasks.Use with the
anomaly_detection
task to convert anomaly scores to1
: an anomaly or0
: normal. 0 <threshold
< 1. The default value is (1 -contamination
)-th percentile of all the anomaly scores.Use with the
recommendation
task and ranking metrics to define positive feedback, and a relevant sample. All rankings at or above thethreshold
are implied to provide positive feedback. All rankings below thethreshold
are implied to provide negative feedback. The default value is 1. -
topk
: The optional top K rows for use with theanomaly_detection
andrecommendation
tasks. A positive integer between 1 and the table length.For the
anomaly_detection
task the results include the top K rows with the highest anomaly scores. It is an integer between 1 and the table length. Iftopk
is not set,ML_SCORE
usesthreshold
.For an
anomaly_detection
task, do not set boththreshold
andtopk
. Usethreshold
ortopk
, or setoptions
toNULL
.For the
recommendation
task and ranking metrics, the number of recommendations to provide. The default is3
.A
recommendation
task and ranking metrics can use boththreshold
andtopk
. remove_seen
: If the input table overlaps with the training table, andremove_seen
istrue
, then the model will not repeat existing interactions. The default istrue
. Setremove_seen
tofalse
to repeat existing interactions from the training table.
-
-
The following example runs
ML_SCORE
on theml_data.iris_train
table to determine model quality:mysql> CALL sys.ML_SCORE('ml_data.iris_validate', 'class', @iris_model, 'balanced_accuracy', @score, NULL); mysql> SELECT @score; +--------------------+ | @score | +--------------------+ | 0.9583333134651184 | +--------------------+
See also: