Documentation Home
HeatWave User Guide
Related Documentation Download this Manual
PDF (US Ltr) - 3.8Mb
PDF (A4) - 3.8Mb


HeatWave User Guide  /  ...  /  Train a Model with Topic Modeling

6.7.6.2 Train a Model with Topic Modeling

After preparing the data for topic modeling, you can train the model.

Before You Begin
Requirements for Topic Modeling Training

Define the following required parameters for topic modeling.

  • Set the task parameter to topic_modeling.

  • document_column: Define the column that contains the text that the model uses to generate topics and tags as output. The output is an array of word groups that best characterize the text.

Unsupported Topic Modeling Options

When MySQL HeatWave AutoML runs topic modeling, the operation is based on a single algorithm that does not require the tuning of hyperparameters. Moreover, topic modeling is an unsupervised task, which means there are no labels. Therefore, you cannot use the following options for topic modeling:

  • model_list

  • optimization_metric

  • exclude_model_list

  • exclude_column_list

  • include_column_list

Unsupported Routines

You cannot run the following routines for topic modeling:

Train Model

Train the model with the ML_TRAIN routine and use the movies table previously created. Before training the model, it is good practice to define the model handle instead of automatically creating one. See Define Model Handle.

  1. Optionally, set the value of the session variable, which sets the model handle to this same value.

    mysql> SET @variable = 'model_handle';

    Replace @variable and model_handle with your own definitions. For example:

    mysql> SET @model='topic_modeling_use_case';

    The model handle is set to topic_modeling_use_case.

  2. Run the ML_TRAIN routine.

    mysql> CALL sys.ML_TRAIN('table_name', 'target_column_name', JSON_OBJECT('task', 'task_name'), model_handle);

    Replace table_name, target_column_name, task_name, and model_handle with your own values.

    The following example runs ML_TRAIN on the dataset previously created.

    mysql> CALL sys.ML_TRAIN('topic_modeling_data.movies', NULL, JSON_OBJECT('task', 'topic_modeling', 'document_column', 'description'), @model);

    Where:

    • topic_modeling_data.movies is the fully qualified name of the table that contains the training dataset (database_name.table_name).

    • NULL is set for the target column because topic modeling uses unlabeled data, so you cannot set a target column.

    • JSON_OBJECT('task', 'topic_modeling') specifies the machine learning task type.

    • @model is the session variable previously set that defines the model handle to the name defined by the user: topic_modeling_use_case. If you do not define the model handle before training the model, the model handle is automatically generated, and the session variable only stores the model handle for the duration of the connection. User variables are written as @var_name. Any valid name for a user-defined variable is permitted. See Work with Model Handles to learn more.

  3. When the training operation finishes, the model handle is assigned to the @model session variable, and the model is stored in the model catalog. View the entry in the model catalog with the following query. Replace user1 with your MySQL account name.

    mysql> SELECT model_id, model_handle, train_table_name FROM ML_SCHEMA_user1.MODEL_CATALOG WHERE model_handle = 'topic_modeling_use_case';
    +----------+-------------------------+----------------------------+
    | model_id | model_handle            | train_table_name           |
    +----------+-------------------------+----------------------------+
    |       8  | topic_modeling_use_case | topic_modeling_data.movies |
    +----------+-------------------------+----------------------------+
    37 rows in set (0.0449 sec)
What's Next