Related Documentation Download this Manual
PDF (US Ltr) - 1.7Mb
PDF (A4) - 1.7Mb


HeatWave User Guide  /  ...  /  Using an Anomaly Detection Model

3.10.2 Using an Anomaly Detection Model

Anomaly detection models produce anomaly scores, which indicate the probability that a row has an anomaly. Anomaly detection uses a threshold to convert anomaly scores to: an anomaly, which is set to 1, or normal, which is set to 0. There are two methods to set the threshold:

  • A threshold value derived from the ML_TRAIN contamination option.

    Threshold = (1 - contamination)-th percentile of all the anomaly scores.

    The default contamination value is 0.01. The default threshold value based on the default contamination value is the 0.99-th percentile of all the anomaly scores.

  • Set the threshold to a specific value.

    The ML_PREDICT_TABLE, ML_PREDICT_ROW, and ML_SCORE routines include a threshold option: 0 < threshold < 1.

An alternative to threshold is topk. The results include the top K rows with the highest anomaly scores. The ML_PREDICT_TABLE and ML_SCORE routines include the topk option, which is an integer between 1 and the table length.

To detect anomalies, run the ML_PREDICT_ROW or ML_PREDICT_TABLE routines on data with the same columns as the training model.

For ML_SCORE the target_column_name column must only contain the anomaly scores as an integer: 1: an anomaly or 0 normal.

ML_SCORE now includes an options parameter in JSON format. The options are threshold and topk.

For instructions to use the ML_PREDICT_ROW, ML_PREDICT_TABLE, and ML_SCORE routines, see Section 3.7, “Predictions”, and Section 3.13.6, “Scoring Models”. For the complete list of option descriptions, see ML_PREDICT_ROW, ML_PREDICT_TABLE, and ML_SCORE.

ML_EXPLAIN, ML_EXPLAIN_ROW and ML_EXPLAIN_TABLE do not support anomaly detection. A call to any of these routines with an anomaly detection model will produce an error.

Syntax Examples

  • An anomaly detection example that uses the roc_auc metric for ML_SCORE.

    mysql> CALL sys.ML_TRAIN('mlcorpus_anomaly_detection.volcanoes-b3_anomaly_train', 
              NULL, JSON_OBJECT('task', 'anomaly_detection', 
              'exclude_column_list', JSON_ARRAY('target')), 
              @anomaly);
    Query OK, 0 rows affected (46.59 sec)
      
    mysql> CALL sys.ML_MODEL_LOAD(@anomaly, NULL);
    Query OK, 0 rows affected (3.23 sec)
      
    mysql> CALL sys.ML_PREDICT_TABLE('mlcorpus_anomaly_detection.volcanoes-b3_anomaly_train', 
              @anomaly, 'mlcorpus_anomaly_detection.volcanoes-predictions',NULL);
    Query OK, 0 rows affected (10.28 sec)      
     
    mysql> SELECT * FROM mlcorpus_anomaly_detection.volcanoes-predictions LIMIT 5;
    +-----+------+------+----------+--------+----------------------------------------------------------------------------------------+
    | _id | V1   | V2   | V3       | target | ml_results                                                                             |
    +-----+------+------+----------+--------+----------------------------------------------------------------------------------------+
    |   1 |  128 |  802 |  0.47255 |      0 | {'predictions': {'is_anomaly': 0}, 'probabilities': {'normal': 0.95, 'anomaly': 0.05}} |
    |   2 |  631 |  642 | 0.387302 |      0 | {'predictions': {'is_anomaly': 0}, 'probabilities': {'normal': 0.96, 'anomaly': 0.04}} |
    |   3 |  438 |  959 | 0.556034 |      0 | {'predictions': {'is_anomaly': 0}, 'probabilities': {'normal': 0.74, 'anomaly': 0.26}} |
    |   4 |  473 |  779 | 0.407626 |      0 | {'predictions': {'is_anomaly': 0}, 'probabilities': {'normal': 0.87, 'anomaly': 0.13}} |
    |   5 |   67 |  933 | 0.383843 |      0 | {'predictions': {'is_anomaly': 0}, 'probabilities': {'normal': 0.95, 'anomaly': 0.05}} |
    +-----+------+------+----------+--------+----------------------------------------------------------------------------------------+
    5 rows in set (0.00 sec)
      
    mysql> CALL sys.ML_SCORE('mlcorpus_anomaly_detection.volcanoes-b3_anomaly_train', 
              'target', @anomaly, 'roc_auc', @score, NULL);
    Query OK, 0 rows affected (5.84 sec)
    
    mysql> SELECT @score;
    +--------------------+
    | @score             |
    +--------------------+
    | 0.7465642094612122 |
    +--------------------+
    1 row in set (0.00 sec)
  • A ML_PREDICT_ROW example that uses default options.

    mysql> CALL sys.ML_PREDICT_ROW('{"V1": 438.0, "V2": 959.0, "V3": 0.556034}', @anomaly, NULL); 
    +----------------------------------------------------------------------------------------------------------------------------------------------------+
    | sys.ML_PREDICT_ROW('{"V1": 438.0, "V2": 959.0, "V3": 0.556034}', @anomaly, NULL)                                                                   |
    +----------------------------------------------------------------------------------------------------------------------------------------------------+
    | {"V1": 438.0, "V2": 959.0, "V3": 0.556034, "ml_results": "{'predictions': {'is_anomaly': 0}, 'probabilities': {'normal': 0.74, 'anomaly': 0.26}}"} |
    +----------------------------------------------------------------------------------------------------------------------------------------------------+
    1 row in set (5.35 sec)
  • A ML_PREDICT_TABLE example that uses the threshold option set to 1%. All rows shown have probabilities of being an anomaly above 1%, and are predicted to be anomalies.

    mysql> CALL sys.ML_PREDICT_TABLE('mlcorpus_anomaly_detection.volcanoes-b3_anomaly_train', 
              @anomaly, 'mlcorpus_anomaly_detection.volcanoes-predictions_threshold', 
              JSON_OBJECT('threshold', 0.01));
    Query OK, 0 rows affected (12.77 sec)
    
    mysql> SELECT * FROM mlcorpus_anomaly_detection.volcanoes-predictions_threshold LIMIT 5;
    +-----+------+------+----------+--------+----------------------------------------------------------------------------------------+
    | _id | V1   | V2   | V3       | target | ml_results                                                                             |
    +-----+------+------+----------+--------+----------------------------------------------------------------------------------------+
    |   1 |  128 |  802 |  0.47255 |      0 | {'predictions': {'is_anomaly': 1}, 'probabilities': {'normal': 0.95, 'anomaly': 0.05}} |
    |   2 |  631 |  642 | 0.387302 |      0 | {'predictions': {'is_anomaly': 1}, 'probabilities': {'normal': 0.96, 'anomaly': 0.04}} |
    |   3 |  438 |  959 | 0.556034 |      0 | {'predictions': {'is_anomaly': 1}, 'probabilities': {'normal': 0.74, 'anomaly': 0.26}} |
    |   4 |  473 |  779 | 0.407626 |      0 | {'predictions': {'is_anomaly': 1}, 'probabilities': {'normal': 0.87, 'anomaly': 0.13}} |
    |   5 |   67 |  933 | 0.383843 |      0 | {'predictions': {'is_anomaly': 1}, 'probabilities': {'normal': 0.95, 'anomaly': 0.05}} |
    +-----+------+------+----------+--------+----------------------------------------------------------------------------------------+
    5 rows in set (0.00 sec)
  • An ML_SCORE example that uses the accuracy metric with a threshold set to 90%.

    mysql> CALL sys.ML_SCORE('mlcorpus_anomaly_detection.volcanoes-b3_anomaly_train', 
              'target', @anomaly, 'accuracy', @score, JSON_OBJECT('threshold', 0.9));
    Query OK, 0 rows affected (1.86 sec)
     
    mysql> SELECT @score;
    +--------------------+
    | @score             |
    +--------------------+
    | 0.9791129231452942 |
    +--------------------+
    1 row in set (0.00 sec)
  • An ML_SCORE example that uses the precision_at_k metric with a topk value of 10.

    mysql> CALL sys.ML_SCORE('mlcorpus_anomaly_detection.volcanoes-b3_anomaly_train', 
              'target', @anomaly, 'precision_at_k', @score, JSON_OBJECT('topk', 10));
    Query OK, 0 rows affected (5.84 sec)      
     
    mysql> SELECT @score;
    +---------------------+
    | @score              |
    +---------------------+
    | 0                   |
    +---------------------+
    1 row in set (0.00 sec)