Related Documentation Download this Manual
PDF (US Ltr) - 1.7Mb
PDF (A4) - 1.7Mb


HeatWave User Guide  /  ...  /  Training an Anomaly Detection Model

3.10.1 Training an Anomaly Detection Model

The GkNN anomaly detection algorithm does not require labeled data to train a model. The target_column_name parameter must be set to NULL.

Anomaly detection introduces an optional contamination factor which represents an estimate of the percentage of outliers in the training table.

Contamination factor := estimated number of rows with anomalies / total number of rows in the training table

To add a contamination factor, use the contamination option.

Run the ML_TRAIN routine to create an anomaly detection model, and use the following JSON options:

  • task: anomaly_detection: Specifies the machine learning task.

  • contamination: 0 < contamination < 0.5. The default value is 0.01.

  • model_list: not supported because GkNN is currently the only supported algorithm.

  • exclude_model_list: not supported because GkNN is currently the only supported algorithm.

  • optimization_metric: not supported because the GkNN algorithm does not require labeled data.

The use of model_list, exclude_model_list or optimization_metric will produce an error.

See Section 3.5, “Training a Model”, and for full details of all the options, see ML_TRAIN.

Syntax Examples

  • An ML_TRAIN example that specifies the anomaly_detection task type:

    mysql> CALL sys.ML_TRAIN('mlcorpus_anomaly_detection.volcanoes-b3_anomaly_train', 
              NULL, JSON_OBJECT('task', 'anomaly_detection', 
              'exclude_column_list', JSON_ARRAY('target')), 
              @anomaly);
    Query OK, 0 rows affected (46.59 sec)
  • An ML_TRAIN example that specifies the anomaly_detection task with a contamination option. Access the model catalog metadata to check the value of the contamination option.

    mysql> CALL sys.ML_TRAIN('mlcorpus_anomaly_detection.volcanoes-b3_anomaly_train', 
              NULL, JSON_OBJECT('task', 'anomaly_detection', 'contamination', 0.013, 
              'exclude_column_list', JSON_ARRAY('target')), 
              @anomaly_with_contamination);
    Query OK, 0 rows affected (50.22 sec)
    
    mysql> SELECT JSON_EXTRACT(model_metadata, '$.contamination') 
              FROM ML_SCHEMA_root.MODEL_CATALOG 
              WHERE model_handle = @anomaly_with_contamination;
    +-------------------------------------------------+
    | JSON_EXTRACT(model_metadata, '$.contamination') |
    +-------------------------------------------------+
    | 0.013000000268220901                            |
    +-------------------------------------------------+
    1 row in set (0.00 sec)