HeatWave User Guide  /  ...  /  ML_EMBED_TABLE

4.6.8 ML_EMBED_TABLE

The ML_EMBED_TABLE routine runs multiple embedding generations in a batch, parallelly.

This routine is available in HeatWave 9.0.1-u1 and later versions.

ML_EMBED_TABLE Syntax

mysql> call sys.ML_EMBED_TABLE('InputTableColumn', 'OutputTableColumn', [options]);
    
options: {
  JSON_OBJECT('key','value'[,'key','value'] ...)
    'key','value': {
    ['model_id', {'all_minilm_l12_v2'|'multilingual-e5-small'}]
    ['truncate', {true|false}]
    ['batch_size', BatchSize]
    }
}

Following are ML_EMBED_TABLE parameters:

  • InputTableColumn: specifies the names of the input database, table, and column that contains the text to encode. The InputTableColumn is specified in the following format: DBName.TableName.ColumnName.

    • The specified input table can be an internal or external table.

    • The specified input table must already exist, must not be empty, and must have a primary key.

    • The input column must already exist and must contain text or varchar values.

    • The input column must not be a part of the primary key and must not have NULL values or empty strings.

    • There must be no backticks used in the DBName, TableName, or ColumnName and there must be no period used in the DBName or TableName.

  • OutputTableColumn: specifies the names of the database, table, and column where the generated embeddings are stored. The OutputTableColumn is specified in the following format: DBName.TableName.ColumnName.

    • The specified output table must be an internal table.

    • If the specified output table already exists, then it must be the same as the input table. And, the specified output column must not already exist in the input table. A new VECTOR column is added to the table. External tables are read only. So if input table is an external table, then it cannot be used to store the output.

    • If the specified output table doesn't exist, then a new table is created. The new output table has key columns which contains the same primary key values as the input table and a VECTOR column that stores the generated embeddings.

    • There must be no backticks used in the DBName, TableName, or ColumnName and there must be no period used in the DBName or TableName.

  • options: specifies optional parameters as key-value pairs in JSON format. It can include the following parameters:

    • model_id: specifies the embedding model to use for encoding the text. Default value is all_minilm_l12_v2. Possible values are:

      • all_minilm_l12_v2: for encoding English text.

      • multilingual-e5-small: for encoding text in supported languages other than English.

      To view the lists of supported models, see Embedding Models. To view the list of supported languages, see Languages.

    • truncate: specifies whether to truncate inputs longer than the maximum token size. Default value is true.

    • batch_size: specifies the batch size for the routine. This parameter is supported for internal tables only. Default value is 1000. Possible values are integer values between 1 and 1000.

Syntax Examples

Generating embeddings for text stored in demo_db.input_table.Input and saving the generating embeddings in demo_db.output_table.Output using the all_minilm_l12_v2 embedding model:

mysql> call sys.ML_EMBED_TABLE("demo_db.input_table.Input", "demo_db.output_table.Output", JSON_OBJECT("model_id", "all_minilm_l12_v2"));