The GkNN anomaly detection algorithm does not require labeled
data to train a model. The
target_column_name
parameter must be set to
NULL
.
Anomaly detection introduces an optional contamination factor which represents an estimate of the percentage of outliers in the training table.
Contamination factor := estimated number of rows with anomalies / total number of rows in the training table
To add a contamination factor, use the
contamination
option.
Run the ML_TRAIN
routine to
create an anomaly detection model, and use the following
JSON
options
:
task
:anomaly_detection
: Specifies the machine learning task.contamination
: 0 <contamination
< 0.5. The default value is 0.01.model_list
: not supported because GkNN is currently the only supported algorithm.exclude_model_list
: not supported because GkNN is currently the only supported algorithm.optimization_metric
: not supported because the GkNN algorithm does not require labeled data.
The use of model_list
,
exclude_model_list
or
optimization_metric
will produce an error.
See Section 3.5, “Training a Model”, and for full details
of all the options
, see
ML_TRAIN
.
-
An
ML_TRAIN
example that specifies theanomaly_detection
task type:mysql> CALL sys.ML_TRAIN('mlcorpus_anomaly_detection.volcanoes-b3_anomaly_train', NULL, JSON_OBJECT('task', 'anomaly_detection', 'exclude_column_list', JSON_ARRAY('target')), @anomaly); Query OK, 0 rows affected (46.59 sec)
-
An
ML_TRAIN
example that specifies theanomaly_detection
task with acontamination
option. Access the model catalog metadata to check the value of thecontamination
option.mysql> CALL sys.ML_TRAIN('mlcorpus_anomaly_detection.volcanoes-b3_anomaly_train', NULL, JSON_OBJECT('task', 'anomaly_detection', 'contamination', 0.013, 'exclude_column_list', JSON_ARRAY('target')), @anomaly_with_contamination); Query OK, 0 rows affected (50.22 sec) mysql> SELECT JSON_EXTRACT(model_metadata, '$.contamination') FROM ML_SCHEMA_root.MODEL_CATALOG WHERE model_handle = @anomaly_with_contamination; +-------------------------------------------------+ | JSON_EXTRACT(model_metadata, '$.contamination') | +-------------------------------------------------+ | 0.013000000268220901 | +-------------------------------------------------+ 1 row in set (0.00 sec)