Documentation Home
HeatWave User Guide
Related Documentation Download this Manual
PDF (US Ltr) - 2.1Mb
PDF (A4) - 2.1Mb


HeatWave User Guide  /  ...  /  Using an Anomaly Detection Model

3.10.3 Using an Anomaly Detection Model

Anomaly detection models produce anomaly scores, which indicate the probability that a row has an anomaly.

To detect anomalies, run the ML_PREDICT_ROW or ML_PREDICT_TABLE routines on data with the same columns as the training model.

For ML_SCORE the target_column_name column must only contain the anomaly scores as an integer: 1: an anomaly or 0 normal.

Anomaly Detection Model Options

Anomaly detection uses a threshold to convert anomaly scores to: an anomaly, which is set to 1, or normal, which is set to 0. There are two methods to set the threshold:

  • A threshold value derived from the ML_TRAIN contamination option.

    Threshold = (1 - contamination)-th percentile of all the anomaly scores.

    The default contamination value is 0.01. The default threshold value based on the default contamination value is the 0.99-th percentile of all the anomaly scores.

  • Set the threshold to a specific value.

    The ML_PREDICT_TABLE, ML_PREDICT_ROW, and ML_SCORE routines include a threshold option: 0 < threshold < 1.

The following additional options are available:

  • An alternative to threshold is topk. The results include the top K rows with the highest anomaly scores. The ML_PREDICT_TABLE and ML_SCORE routines include the topk option, which is an integer between 1 and the table length.

  • ML_SCORE includes an options parameter in JSON format. The options are threshold and topk.

  • When running a semi-supervised model, the ML_PREDICT_ROW, ML_PREDICT_TABLE, and ML_SCORE routines have the supervised_submodel_weight option. It allows you to override the ensemble_score weighting estimated during ML_TRAIN with a new value. The value must be greater than 0 and less than 1.0.

Syntax Examples for Unsupervised Learning

  • An anomaly detection example that uses the roc_auc metric for ML_SCORE.

    mysql> CALL sys.ML_TRAIN('mlcorpus_anomaly_detection.volcanoes-b3_anomaly_train', 
              NULL, JSON_OBJECT('task', 'anomaly_detection', 
              'exclude_column_list', JSON_ARRAY('target')), 
              @anomaly);
    Query OK, 0 rows affected (46.59 sec)
      
    mysql> CALL sys.ML_MODEL_LOAD(@anomaly, NULL);
    Query OK, 0 rows affected (3.23 sec)
      
    mysql> CALL sys.ML_PREDICT_TABLE('mlcorpus_anomaly_detection.volcanoes-b3_anomaly_train', 
              @anomaly, 'mlcorpus_anomaly_detection.volcanoes-predictions',NULL);
    Query OK, 0 rows affected (10.28 sec)      
     
    mysql> SELECT * FROM mlcorpus_anomaly_detection.volcanoes-predictions LIMIT 5;
    +-------------------+------+------+----------+--------+----------------------------------------------------------------------------------------+
    | _4aad19ca6e_pk_id | V1   | V2   | V3       | target | ml_results                                                                             |
    +-------------------+------+------+----------+--------+----------------------------------------------------------------------------------------+
    |                 1 |  128 |  802 |  0.47255 |      0 | {'predictions': {'is_anomaly': 0}, 'probabilities': {'normal': 0.95, 'anomaly': 0.05}} |
    |                 2 |  631 |  642 | 0.387302 |      0 | {'predictions': {'is_anomaly': 0}, 'probabilities': {'normal': 0.96, 'anomaly': 0.04}} |
    |                 3 |  438 |  959 | 0.556034 |      0 | {'predictions': {'is_anomaly': 0}, 'probabilities': {'normal': 0.74, 'anomaly': 0.26}} |
    |                 4 |  473 |  779 | 0.407626 |      0 | {'predictions': {'is_anomaly': 0}, 'probabilities': {'normal': 0.87, 'anomaly': 0.13}} |
    |                 5 |   67 |  933 | 0.383843 |      0 | {'predictions': {'is_anomaly': 0}, 'probabilities': {'normal': 0.95, 'anomaly': 0.05}} |
    +-----+------+------+----------+--------+----------------------------------------------------------------------------------------+
    5 rows in set (0.00 sec)
      
    mysql> CALL sys.ML_SCORE('mlcorpus_anomaly_detection.volcanoes-b3_anomaly_train', 
              'target', @anomaly, 'roc_auc', @score, NULL);
    Query OK, 0 rows affected (5.84 sec)
    
    mysql> SELECT @score;
    +--------------------+
    | @score             |
    +--------------------+
    | 0.7465642094612122 |
    +--------------------+
    1 row in set (0.00 sec)
  • An ML_PREDICT_ROW example that uses default options.

    mysql> SELECT sys.ML_PREDICT_ROW('{"V1": 438.0, "V2": 959.0, "V3": 0.556034}', @anomaly, NULL); 
    +----------------------------------------------------------------------------------------------------------------------------------------------------+
    | sys.ML_PREDICT_ROW('{"V1": 438.0, "V2": 959.0, "V3": 0.556034}', @anomaly, NULL)                                                                   |
    +----------------------------------------------------------------------------------------------------------------------------------------------------+
    | {"V1": 438.0, "V2": 959.0, "V3": 0.556034, "ml_results": "{'predictions': {'is_anomaly': 0}, 'probabilities': {'normal': 0.74, 'anomaly': 0.26}}"} |
    +----------------------------------------------------------------------------------------------------------------------------------------------------+
    1 row in set (5.35 sec)
  • An ML_PREDICT_TABLE example that uses the threshold option set to 1%. All rows shown have probabilities of being an anomaly above 1%, and are predicted to be anomalies.

    mysql> CALL sys.ML_PREDICT_TABLE('mlcorpus_anomaly_detection.volcanoes-b3_anomaly_train', 
              @anomaly, 'mlcorpus_anomaly_detection.volcanoes-predictions_threshold', 
              JSON_OBJECT('threshold', 0.01));
    Query OK, 0 rows affected (12.77 sec)
    
    mysql> SELECT * FROM mlcorpus_anomaly_detection.volcanoes-predictions_threshold LIMIT 5;
    +-------------------+------+------+----------+--------+----------------------------------------------------------------------------------------+
    | _4aad19ca6e_pk_id | V1   | V2   | V3       | target | ml_results                                                                             |
    +-------------------+------+------+----------+--------+----------------------------------------------------------------------------------------+
    |                 1 |  128 |  802 |  0.47255 |      0 | {'predictions': {'is_anomaly': 1}, 'probabilities': {'normal': 0.95, 'anomaly': 0.05}} |
    |                 2 |  631 |  642 | 0.387302 |      0 | {'predictions': {'is_anomaly': 1}, 'probabilities': {'normal': 0.96, 'anomaly': 0.04}} |
    |                 3 |  438 |  959 | 0.556034 |      0 | {'predictions': {'is_anomaly': 1}, 'probabilities': {'normal': 0.74, 'anomaly': 0.26}} |
    |                 4 |  473 |  779 | 0.407626 |      0 | {'predictions': {'is_anomaly': 1}, 'probabilities': {'normal': 0.87, 'anomaly': 0.13}} |
    |                 5 |   67 |  933 | 0.383843 |      0 | {'predictions': {'is_anomaly': 1}, 'probabilities': {'normal': 0.95, 'anomaly': 0.05}} |
    +-------------------+------+------+----------+--------+----------------------------------------------------------------------------------------+
    5 rows in set (0.00 sec)
  • An ML_SCORE example that uses the accuracy metric with a threshold set to 90%.

    mysql> CALL sys.ML_SCORE('mlcorpus_anomaly_detection.volcanoes-b3_anomaly_train', 
              'target', @anomaly, 'accuracy', @score, JSON_OBJECT('threshold', 0.9));
    Query OK, 0 rows affected (1.86 sec)
     
    mysql> SELECT @score;
    +--------------------+
    | @score             |
    +--------------------+
    | 0.9791129231452942 |
    +--------------------+
    1 row in set (0.00 sec)
  • An ML_SCORE example that uses the precision_at_k metric with a topk value of 10.

    mysql> CALL sys.ML_SCORE('mlcorpus_anomaly_detection.volcanoes-b3_anomaly_train', 
              'target', @anomaly, 'precision_at_k', @score, JSON_OBJECT('topk', 10));
    Query OK, 0 rows affected (5.84 sec)      
     
    mysql> SELECT @score;
    +---------------------+
    | @score              |
    +---------------------+
    | 0                   |
    +---------------------+
    1 row in set (0.00 sec)

Syntax Examples for Semi-Supervised Learning

  • An ML_PREDICT_ROW example that overrides the ensemble_score value from the ML_TRAIN routine to a new value of 0.8

    mysql> SET @row_input = JSON_OBJECT('V1', 250, 'V2', 525, 'V3', 0.438976);
    mysql> SELECT sys.ML_PREDICT_ROW(@row_input, @semsup_gknn, 
    CAST('{"experimental": {"semisupervised": {"supervised_submodel_weight": 0.8}}}' as JSON));
  • An ML_PREDICT_TABLE example that overrides the ensemble_score value from the ML_TRAIN routine to a new value of 0.5.

    mysql> CALL sys.ML_PREDICT_TABLE('mlcorpus.anomaly_train',
     @semsup_gknn, 'mlcorpus.preds_gknn_weighted', 
    CAST('{"experimental": {"semisupervised": {"supervised_submodel_weight": 0.5}}}' as JSON));
  • An ML_SCORE example that overrides the ensemble_score value from the ML_TRAIN routine to a new value of 0.5.

    mysql> CALL sys.ML_SCORE('mlcorpus.anomaly_train_with_target', "target", 
    @semsup_gknn, 'precision_at_k', @semsup_score_gknn_weighted, 
    CAST('{"topk": 10, "experimental": 
    {"semisupervised": {"supervised_submodel_weight": 0.5}}}' as JSON));