To add topic modeling when using the
ML_TRAIN
routine, you need to
use the document_column
parameter in the
options argument as a key-value pair. This represents the name
of the column that contains the text that topic modeling
training will use to generate topics and tags as output. The
output is an array of word groups that best characterize the
text.
When HeatWave AutoML runs topic modeling, the operation is based on a single algorithm that does not require the tuning of hyperparameters. Moreover, topic modeling is an unsupervised task, which means there are no labels. Therefore, the following options are not supported for topic modeling:
model_list
optimization_metric
exclude_model_list
exclude_column_list
include_column_list
The following example runs
ML_TRAIN
and includes the
option to add topic modeling to the training:
mysql> CALL sys.ML_TRAIN('schema_name.table_name', NULL, JSON_OBJECT('task', 'topic_modeling',
'document_column', 'column_name'), @topic_modeling);
Where:
schema_name
is the database name that contains the table. Update this with the appropriate database.table_name
is the table name that contains the data to analyze. Update this with the appropriate table name.The target column argument is set to
NULL
because topic modeling is an unsupervised task and does not need labeled data to train the model.JSON_OBJECT('task', 'topic_modeling', 'document_column', 'column_name')
specifies the machine learning task and text to train.The
task
must be set totopic_modeling.
The
document_column
represents the name of the column that contains the text to train. Updatecolumn_name
with the appropriate column name.@topic_modeling
is the name of the user-defined session variable that stores the model handle for the duration of the connection. You can customize this name to your preference.
Once the model is trained, you can start using it for topic modeling in table and row predictions.