Anomaly detection models produce anomaly scores, which indicate the probability that a row has an anomaly.
To detect anomalies, run the
ML_PREDICT_ROW
or
ML_PREDICT_TABLE
routines on
data with the same columns as the training model.
For ML_SCORE
the
target_column_name
column must only contain
the anomaly scores as an integer: 1
: an
anomaly or 0
normal.
Anomaly detection uses a threshold to convert anomaly scores
to: an anomaly, which is set to 1
, or
normal, which is set to 0
. There are two
methods to set the threshold:
-
A threshold value derived from the
ML_TRAIN
contamination
option.Threshold = (1 -
contamination
)-th percentile of all the anomaly scores.The default
contamination
value is 0.01. The defaultthreshold
value based on the defaultcontamination
value is the 0.99-th percentile of all the anomaly scores. -
Set the threshold to a specific value.
The
ML_PREDICT_TABLE
,ML_PREDICT_ROW
, andML_SCORE
routines include athreshold
option: 0 <threshold
< 1.
The following additional options are available:
An alternative to
threshold
istopk
. The results include the top K rows with the highest anomaly scores. TheML_PREDICT_TABLE
andML_SCORE
routines include thetopk
option, which is an integer between 1 and the table length.ML_SCORE
includes an options parameter inJSON
format. The options arethreshold
andtopk
.When running a semi-supervised model, the
ML_PREDICT_ROW
,ML_PREDICT_TABLE
, andML_SCORE
routines have thesupervised_submodel_weight
option. It allows you to override theensemble_score
weighting estimated duringML_TRAIN
with a new value. The value must be greater than 0 and less than 1.0.
-
An anomaly detection example that uses the
roc_auc
metric forML_SCORE
.mysql> CALL sys.ML_TRAIN('mlcorpus_anomaly_detection.volcanoes-b3_anomaly_train', NULL, JSON_OBJECT('task', 'anomaly_detection', 'exclude_column_list', JSON_ARRAY('target')), @anomaly); Query OK, 0 rows affected (46.59 sec) mysql> CALL sys.ML_MODEL_LOAD(@anomaly, NULL); Query OK, 0 rows affected (3.23 sec) mysql> CALL sys.ML_PREDICT_TABLE('mlcorpus_anomaly_detection.volcanoes-b3_anomaly_train', @anomaly, 'mlcorpus_anomaly_detection.volcanoes-predictions',NULL); Query OK, 0 rows affected (10.28 sec) mysql> SELECT * FROM mlcorpus_anomaly_detection.volcanoes-predictions LIMIT 5; +-------------------+------+------+----------+--------+----------------------------------------------------------------------------------------+ | _4aad19ca6e_pk_id | V1 | V2 | V3 | target | ml_results | +-------------------+------+------+----------+--------+----------------------------------------------------------------------------------------+ | 1 | 128 | 802 | 0.47255 | 0 | {'predictions': {'is_anomaly': 0}, 'probabilities': {'normal': 0.95, 'anomaly': 0.05}} | | 2 | 631 | 642 | 0.387302 | 0 | {'predictions': {'is_anomaly': 0}, 'probabilities': {'normal': 0.96, 'anomaly': 0.04}} | | 3 | 438 | 959 | 0.556034 | 0 | {'predictions': {'is_anomaly': 0}, 'probabilities': {'normal': 0.74, 'anomaly': 0.26}} | | 4 | 473 | 779 | 0.407626 | 0 | {'predictions': {'is_anomaly': 0}, 'probabilities': {'normal': 0.87, 'anomaly': 0.13}} | | 5 | 67 | 933 | 0.383843 | 0 | {'predictions': {'is_anomaly': 0}, 'probabilities': {'normal': 0.95, 'anomaly': 0.05}} | +-----+------+------+----------+--------+----------------------------------------------------------------------------------------+ 5 rows in set (0.00 sec) mysql> CALL sys.ML_SCORE('mlcorpus_anomaly_detection.volcanoes-b3_anomaly_train', 'target', @anomaly, 'roc_auc', @score, NULL); Query OK, 0 rows affected (5.84 sec) mysql> SELECT @score; +--------------------+ | @score | +--------------------+ | 0.7465642094612122 | +--------------------+ 1 row in set (0.00 sec)
-
An
ML_PREDICT_ROW
example that uses default options.mysql> SELECT sys.ML_PREDICT_ROW('{"V1": 438.0, "V2": 959.0, "V3": 0.556034}', @anomaly, NULL); +----------------------------------------------------------------------------------------------------------------------------------------------------+ | sys.ML_PREDICT_ROW('{"V1": 438.0, "V2": 959.0, "V3": 0.556034}', @anomaly, NULL) | +----------------------------------------------------------------------------------------------------------------------------------------------------+ | {"V1": 438.0, "V2": 959.0, "V3": 0.556034, "ml_results": "{'predictions': {'is_anomaly': 0}, 'probabilities': {'normal': 0.74, 'anomaly': 0.26}}"} | +----------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (5.35 sec)
-
An
ML_PREDICT_TABLE
example that uses thethreshold
option set to 1%. All rows shown have probabilities of being an anomaly above 1%, and are predicted to be anomalies.mysql> CALL sys.ML_PREDICT_TABLE('mlcorpus_anomaly_detection.volcanoes-b3_anomaly_train', @anomaly, 'mlcorpus_anomaly_detection.volcanoes-predictions_threshold', JSON_OBJECT('threshold', 0.01)); Query OK, 0 rows affected (12.77 sec) mysql> SELECT * FROM mlcorpus_anomaly_detection.volcanoes-predictions_threshold LIMIT 5; +-------------------+------+------+----------+--------+----------------------------------------------------------------------------------------+ | _4aad19ca6e_pk_id | V1 | V2 | V3 | target | ml_results | +-------------------+------+------+----------+--------+----------------------------------------------------------------------------------------+ | 1 | 128 | 802 | 0.47255 | 0 | {'predictions': {'is_anomaly': 1}, 'probabilities': {'normal': 0.95, 'anomaly': 0.05}} | | 2 | 631 | 642 | 0.387302 | 0 | {'predictions': {'is_anomaly': 1}, 'probabilities': {'normal': 0.96, 'anomaly': 0.04}} | | 3 | 438 | 959 | 0.556034 | 0 | {'predictions': {'is_anomaly': 1}, 'probabilities': {'normal': 0.74, 'anomaly': 0.26}} | | 4 | 473 | 779 | 0.407626 | 0 | {'predictions': {'is_anomaly': 1}, 'probabilities': {'normal': 0.87, 'anomaly': 0.13}} | | 5 | 67 | 933 | 0.383843 | 0 | {'predictions': {'is_anomaly': 1}, 'probabilities': {'normal': 0.95, 'anomaly': 0.05}} | +-------------------+------+------+----------+--------+----------------------------------------------------------------------------------------+ 5 rows in set (0.00 sec)
-
An
ML_SCORE
example that uses theaccuracy
metric with athreshold
set to 90%.mysql> CALL sys.ML_SCORE('mlcorpus_anomaly_detection.volcanoes-b3_anomaly_train', 'target', @anomaly, 'accuracy', @score, JSON_OBJECT('threshold', 0.9)); Query OK, 0 rows affected (1.86 sec) mysql> SELECT @score; +--------------------+ | @score | +--------------------+ | 0.9791129231452942 | +--------------------+ 1 row in set (0.00 sec)
-
An
ML_SCORE
example that uses theprecision_at_k
metric with atopk
value of 10.mysql> CALL sys.ML_SCORE('mlcorpus_anomaly_detection.volcanoes-b3_anomaly_train', 'target', @anomaly, 'precision_at_k', @score, JSON_OBJECT('topk', 10)); Query OK, 0 rows affected (5.84 sec) mysql> SELECT @score; +---------------------+ | @score | +---------------------+ | 0 | +---------------------+ 1 row in set (0.00 sec)
-
An
ML_PREDICT_ROW
example that overrides theensemble_score
value from theML_TRAIN
routine to a new value of 0.8mysql> SET @row_input = JSON_OBJECT('V1', 250, 'V2', 525, 'V3', 0.438976); mysql> SELECT sys.ML_PREDICT_ROW(@row_input, @semsup_gknn, CAST('{"experimental": {"semisupervised": {"supervised_submodel_weight": 0.8}}}' as JSON));
-
An
ML_PREDICT_TABLE
example that overrides theensemble_score
value from theML_TRAIN
routine to a new value of 0.5.mysql> CALL sys.ML_PREDICT_TABLE('mlcorpus.anomaly_train', @semsup_gknn, 'mlcorpus.preds_gknn_weighted', CAST('{"experimental": {"semisupervised": {"supervised_submodel_weight": 0.5}}}' as JSON));
-
An
ML_SCORE
example that overrides theensemble_score
value from theML_TRAIN
routine to a new value of 0.5.mysql> CALL sys.ML_SCORE('mlcorpus.anomaly_train_with_target', "target", @semsup_gknn, 'precision_at_k', @semsup_score_gknn_weighted, CAST('{"topk": 10, "experimental": {"semisupervised": {"supervised_submodel_weight": 0.5}}}' as JSON));