ML_SCORE
scores a model by generating predictions using the feature
columns in a labeled dataset as input and comparing the
predictions to ground truth values in the target column of the
labeled dataset. The dataset used with
ML_SCORE
should have the same feature columns as the dataset used to
train the model but the data should be different. For example,
you might reserve 20 to 30 percent of the labeled training data
for scoring.
ML_SCORE
returns a computed metric indicating the quality of the model.
mysql> CALL sys.ML_SCORE(table_name, target_column_name, model_handle, metric, score, [options]);
options: {
JSON_OBJECT("key","value"[,"key","value"] ...)
"key","value": {
['threshold', 'N']
['topk', 'N']
['remove_seen', {'true'|'false'}]
['item_metadata', JSON_OBJECT('table_name'[,'database_name.table_name'] ...)]
['user_metadata', JSON_OBJECT('table_name'[,'database_name.table_name'] ...)]
}
}
Set the following required parameters.
table_name: Specifies the fully qualified name of the table used to compute model quality (database_name.table_name). The table must contain the same columns as the training dataset.target_column_name: If scoring a supervised or semi-supervised model, specify the name of the target column containing ground truth values. If scoring an unsupervised model, set toNULL. See AutoML Learning Types.model_handle: Specifies the model handle or a session variable containing the model handle. See Work with Model Handles.metric: Specifies the name of the metric. The metric selected must be compatible with thetasktype used for training the model. See Optimization and Scoring Metrics.score: Specifies the user-defined variable name for the computed score. TheML_SCOREroutine populates the variable. User variables are written as@. Any valid name for a user-defined variable is permitted.var_name
The following options in JSON format are
available for recommendation and anomaly detection models.
Set the following options as needed for recommendation models.
threshold: The optional threshold that defines positive feedback, and a relevant sample. Only use with ranking metrics. It can be used for either explicit or implicit feedback.-
topk: The optional top number of recommendations to provide. The default is3. Set a positive integer between 1 and the number of rows in the table.A
recommendationtask and ranking metrics can use boththresholdandtopk. remove_seen: If the input table overlaps with the training table, andremove_seenistrue, then the model will not repeat existing interactions. The default istrue. Setremove_seentofalseto repeat existing interactions from the training table.item_metadata: Defines the table that has item descriptions. It is a JSON object that has thetable_nameoption as a key, which specifies the table that has item descriptions. One column must be the same as theitem_idin the input table.-
user_metadata: Defines the table that has user descriptions. It is a JSON object that has thetable_nameoption as a key, which specifies the table that has user descriptions. One column must be the same as theuser_idin the input table.table_name: To be used with theitem_metadataanduser_metadataoptions. It specifies the table name that has item or user descriptions. It must be a string in a fully qualified format (schema_name.table_name) that specifies the table name.
Set the following options as needed for anomaly detection models.
threshold: The threshold you set on anomaly detection models determines which rows in the output table are labeled as anomalies with an anomaly score of1, or normal with an anomaly score of0. The value for the threshold is the degree to which a row of data or log segment is considered for anomaly detection. Any sample with an anomaly score above the threshold is classified an anomaly. The default value is (1 -contamination)-th percentile of all the anomaly scores.-
topk: The optional top K rows to display with the highest anomaly scores. Set a positive integer between 1 and the number of rows in the table. Iftopkis not set,ML_SCOREusesthreshold.Do not set both
thresholdandtopk. Usethresholdortopk, or setoptionstoNULL.
-
The following example runs generates a score by using the
balanced_accuracymetric. Query the score with the session variable for theML_SCOREroutine.mysql> CALL sys.ML_SCORE('census_data.census_train', 'revenue', 'census_data.census_train_admin_1745439945171', 'balanced_accuracy', @score, NULL); Query OK, 0 rows affected (3.0536 sec) mysql> SELECT @score; +--------------------+ | @score | +--------------------+ | 0.8151071071624756 | +--------------------+ 1 row in set (0.0411 sec) -
The following example uses the
accuracymetric with athresholdset to 90%.mysql> CALL sys.ML_SCORE('mlcorpus_anomaly_detection.volcanoes-b3_anomaly_train', 'target', @anomaly, 'accuracy', @score, JSON_OBJECT('threshold', 0.9)); Query OK, 0 rows affected (1.86 sec) mysql> SELECT @score; +--------------------+ | @score | +--------------------+ | 0.9791129231452942 | +--------------------+ 1 row in set (0.00 sec) -
The following example uses the
precision_at_kmetric with atopkvalue of 10.mysql> CALL sys.ML_SCORE('mlcorpus_anomaly_detection.volcanoes-b3_anomaly_train', 'target', @anomaly, 'precision_at_k', @score, JSON_OBJECT('topk', 10)); Query OK, 0 rows affected (5.84 sec) mysql> SELECT @score; +---------------------+ | @score | +---------------------+ | 0 | +---------------------+ 1 row in set (0.00 sec) -
The following example overrides the
ensemble_scorevalue from theML_TRAINroutine to a new value of 0.5.mysql> CALL sys.ML_SCORE('mlcorpus.anomaly_train_with_target', "target", @semsup_gknn, 'precision_at_k', @semsup_score_gknn_weighted, CAST('{"topk": 10, "experimental": {"semisupervised": {"supervised_submodel_weight": 0.5}}}' as JSON));