MySQL 9.2.2 introduces the ability to detect anomalies in log data. To perform anomaly detection on logs, log data is cleaned, segemented, and encoded before running anomaly detection. This feature leverages the log template miner Drain3.
Consider the following when running anomaly detection on logs.
-
The input table can only have the following columns:
The column containing the logs.
If including logs from different sources, a column containing the source of each log. The values in this column contain the names of the sources that each log belongs to. These values are used to group each host's logs together. If this column is not present, it is assumed that all logs originate from the same source.
If including labeled data, a column identifying the labeled log lines. See Semi-supervised Anomaly Detection to learn more.
If the input table has additional columns to the ones permitted, you must use the
exclude_column_list
option when runningML_TRAIN
to exclude irrelevant columns.The data collected for anomaly detection can be unsupervised or semi-supervised. To run semi-supervised anomaly detection, you can provide a separate column in the input table with labels for the labeled log lines. This column labels identified anomalous logs with a value of 1, non-anomalous logs with 0, and unlabeled logs with NULL. See Semi-supervised Anomaly Detection to learn more.
In addition to the anomaly scores included in the output table, you have the option to leverage HeatWave GenAI to provide textual log summaries.
By default the following parameters are masked in the input data (training or test data): IP, DATETIME, TIME, HEX, IPPORT, and OCID. You have the option to mask additional regex patterns with the
additional_masking_regex
option.