Anomaly detection models produce anomaly scores, which
indicate the probability that a row has an anomaly. Anomaly
detection uses a threshold to convert anomaly scores to: an
anomaly, which is set to 1
, or normal,
which is set to 0
. There are two methods to
set the threshold:
-
A threshold value derived from the
ML_TRAIN
contamination
option.Threshold = (1 -
contamination
)-th percentile of all the anomaly scores.The default
contamination
value is 0.01. The defaultthreshold
value based on the defaultcontamination
value is the 0.99-th percentile of all the anomaly scores. -
Set the threshold to a specific value.
The
ML_PREDICT_TABLE
,ML_PREDICT_ROW
, andML_SCORE
routines include athreshold
option: 0 <threshold
< 1.
An alternative to threshold
is
topk
. The results include the top K rows
with the highest anomaly scores. The
ML_PREDICT_TABLE
and
ML_SCORE
routines include the
topk
option, which is an integer between 1
and the table length.
To detect anomalies, run the
ML_PREDICT_ROW
or
ML_PREDICT_TABLE
routines on
data with the same columns as the training model.
For ML_SCORE
the
target_column_name
column must only contain
the anomaly scores as an integer: 1
: an
anomaly or 0
normal.
ML_SCORE
now includes an
options parameter in JSON
format. The
options are threshold
and
topk
.
For instructions to use the
ML_PREDICT_ROW
,
ML_PREDICT_TABLE
, and
ML_SCORE
routines, see
Section 3.7, “Predictions”, and
Section 3.13.6, “Scoring Models”. For the complete
list of option descriptions, see
ML_PREDICT_ROW
,
ML_PREDICT_TABLE
, and
ML_SCORE
.
ML_EXPLAIN
,
ML_EXPLAIN_ROW
and
ML_EXPLAIN_TABLE
do not support
anomaly detection. A call to any of these routines with an
anomaly detection model will produce an error.
-
An anomaly detection example that uses the
roc_auc
metric forML_SCORE
.mysql> CALL sys.ML_TRAIN('mlcorpus_anomaly_detection.volcanoes-b3_anomaly_train', NULL, JSON_OBJECT('task', 'anomaly_detection', 'exclude_column_list', JSON_ARRAY('target')), @anomaly); Query OK, 0 rows affected (46.59 sec) mysql> CALL sys.ML_MODEL_LOAD(@anomaly, NULL); Query OK, 0 rows affected (3.23 sec) mysql> CALL sys.ML_PREDICT_TABLE('mlcorpus_anomaly_detection.volcanoes-b3_anomaly_train', @anomaly, 'mlcorpus_anomaly_detection.volcanoes-predictions',NULL); Query OK, 0 rows affected (10.28 sec) mysql> SELECT * FROM mlcorpus_anomaly_detection.volcanoes-predictions LIMIT 5; +-----+------+------+----------+--------+----------------------------------------------------------------------------------------+ | _id | V1 | V2 | V3 | target | ml_results | +-----+------+------+----------+--------+----------------------------------------------------------------------------------------+ | 1 | 128 | 802 | 0.47255 | 0 | {'predictions': {'is_anomaly': 0}, 'probabilities': {'normal': 0.95, 'anomaly': 0.05}} | | 2 | 631 | 642 | 0.387302 | 0 | {'predictions': {'is_anomaly': 0}, 'probabilities': {'normal': 0.96, 'anomaly': 0.04}} | | 3 | 438 | 959 | 0.556034 | 0 | {'predictions': {'is_anomaly': 0}, 'probabilities': {'normal': 0.74, 'anomaly': 0.26}} | | 4 | 473 | 779 | 0.407626 | 0 | {'predictions': {'is_anomaly': 0}, 'probabilities': {'normal': 0.87, 'anomaly': 0.13}} | | 5 | 67 | 933 | 0.383843 | 0 | {'predictions': {'is_anomaly': 0}, 'probabilities': {'normal': 0.95, 'anomaly': 0.05}} | +-----+------+------+----------+--------+----------------------------------------------------------------------------------------+ 5 rows in set (0.00 sec) mysql> CALL sys.ML_SCORE('mlcorpus_anomaly_detection.volcanoes-b3_anomaly_train', 'target', @anomaly, 'roc_auc', @score, NULL); Query OK, 0 rows affected (5.84 sec) mysql> SELECT @score; +--------------------+ | @score | +--------------------+ | 0.7465642094612122 | +--------------------+ 1 row in set (0.00 sec)
-
A
ML_PREDICT_ROW
example that uses default options.mysql> CALL sys.ML_PREDICT_ROW('{"V1": 438.0, "V2": 959.0, "V3": 0.556034}', @anomaly, NULL); +----------------------------------------------------------------------------------------------------------------------------------------------------+ | sys.ML_PREDICT_ROW('{"V1": 438.0, "V2": 959.0, "V3": 0.556034}', @anomaly, NULL) | +----------------------------------------------------------------------------------------------------------------------------------------------------+ | {"V1": 438.0, "V2": 959.0, "V3": 0.556034, "ml_results": "{'predictions': {'is_anomaly': 0}, 'probabilities': {'normal': 0.74, 'anomaly': 0.26}}"} | +----------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (5.35 sec)
-
A
ML_PREDICT_TABLE
example that uses thethreshold
option set to 1%. All rows shown have probabilities of being an anomaly above 1%, and are predicted to be anomalies.mysql> CALL sys.ML_PREDICT_TABLE('mlcorpus_anomaly_detection.volcanoes-b3_anomaly_train', @anomaly, 'mlcorpus_anomaly_detection.volcanoes-predictions_threshold', JSON_OBJECT('threshold', 0.01)); Query OK, 0 rows affected (12.77 sec) mysql> SELECT * FROM mlcorpus_anomaly_detection.volcanoes-predictions_threshold LIMIT 5; +-----+------+------+----------+--------+----------------------------------------------------------------------------------------+ | _id | V1 | V2 | V3 | target | ml_results | +-----+------+------+----------+--------+----------------------------------------------------------------------------------------+ | 1 | 128 | 802 | 0.47255 | 0 | {'predictions': {'is_anomaly': 1}, 'probabilities': {'normal': 0.95, 'anomaly': 0.05}} | | 2 | 631 | 642 | 0.387302 | 0 | {'predictions': {'is_anomaly': 1}, 'probabilities': {'normal': 0.96, 'anomaly': 0.04}} | | 3 | 438 | 959 | 0.556034 | 0 | {'predictions': {'is_anomaly': 1}, 'probabilities': {'normal': 0.74, 'anomaly': 0.26}} | | 4 | 473 | 779 | 0.407626 | 0 | {'predictions': {'is_anomaly': 1}, 'probabilities': {'normal': 0.87, 'anomaly': 0.13}} | | 5 | 67 | 933 | 0.383843 | 0 | {'predictions': {'is_anomaly': 1}, 'probabilities': {'normal': 0.95, 'anomaly': 0.05}} | +-----+------+------+----------+--------+----------------------------------------------------------------------------------------+ 5 rows in set (0.00 sec)
-
An
ML_SCORE
example that uses theaccuracy
metric with athreshold
set to 90%.mysql> CALL sys.ML_SCORE('mlcorpus_anomaly_detection.volcanoes-b3_anomaly_train', 'target', @anomaly, 'accuracy', @score, JSON_OBJECT('threshold', 0.9)); Query OK, 0 rows affected (1.86 sec) mysql> SELECT @score; +--------------------+ | @score | +--------------------+ | 0.9791129231452942 | +--------------------+ 1 row in set (0.00 sec)
-
An
ML_SCORE
example that uses theprecision_at_k
metric with atopk
value of 10.mysql> CALL sys.ML_SCORE('mlcorpus_anomaly_detection.volcanoes-b3_anomaly_train', 'target', @anomaly, 'precision_at_k', @score, JSON_OBJECT('topk', 10)); Query OK, 0 rows affected (5.84 sec) mysql> SELECT @score; +---------------------+ | @score | +---------------------+ | 0 | +---------------------+ 1 row in set (0.00 sec)