Documentation Home
HeatWave User Guide
Related Documentation Download this Manual
PDF (US Ltr) - 2.0Mb
PDF (A4) - 2.0Mb


HeatWave User Guide  /  ...  /  Training a Model with Topic Modeling

3.13.1 Training a Model with Topic Modeling

To add topic modeling when using the ML_TRAIN routine, you need to use the document_column parameter in the options argument as a key-value pair. This represents the name of the column that contains the text that topic modeling training will use to generate topics and tags as output. The output is an array of word groups that best characterize the text.

When HeatWave AutoML runs topic modeling, the operation is based on a single algorithm that does not require the tuning of hyperparameters. Moreover, topic modeling is an unsupervised task, which means there are no labels. Therefore, the following options are not supported for topic modeling:

  • model_list

  • optimization_metric

  • exclude_model_list

  • exclude_column_list

  • include_column_list

The following example runs ML_TRAIN and includes the option to add topic modeling to the training:

mysql> CALL sys.ML_TRAIN('schema_name.table_name', NULL, JSON_OBJECT('task', 'topic_modeling',
 'document_column', 'column_name'), @topic_modeling);

Where:

  • schema_name is the database name that contains the table. Update this with the appropriate database.

  • table_name is the table name that contains the data to analyze. Update this with the appropriate table name.

  • The target column argument is set to NULL because topic modeling is an unsupervised task and does not need labeled data to train the model.

  • JSON_OBJECT('task', 'topic_modeling', 'document_column', 'column_name') specifies the machine learning task and text to train.

  • The task must be set to topic_modeling.

  • The document_column represents the name of the column that contains the text to train. Update column_name with the appropriate column name.

  • @topic_modeling is the name of the user-defined session variable that stores the model handle for the duration of the connection. You can customize this name to your preference.

Once the model is trained, you can start using it for topic modeling in table and row predictions.