MySQL AI  /  ...  /  ML_EMBED_TABLE

7.2.8 ML_EMBED_TABLE

The ML_EMBED_TABLE routine runs multiple embedding generations in a batch, in parallel.

This topic contains the following sections:

To learn about the privileges you need to run this routine, see Section 5.3, “Required Privileges for using GenAI”.

ML_EMBED_TABLE Syntax

mysql> CALL sys.ML_EMBED_TABLE('InputTableColumn', 'OutputTableColumn'[, options]);
    
options: JSON_OBJECT(keyvalue[, keyvalue] ...)
keyvalue: 
{
  'model_id', {'ModelID'}
  |'truncate', {true|false}
  |'batch_size', BatchSize
  |'details_column', 'ErrorDetailsColumnName'
}

Following are ML_EMBED_TABLE parameters:

  • InputTableColumn: specifies the names of the input database, table, and column that contains the text to encode. The InputTableColumn is specified in the following format: DBName.TableName.ColumnName.

    • The specified input table can be an internal or external table.

    • The specified input table must already exist, must not be empty, and must have a primary key.

    • The input column must already exist and must contain text or varchar values.

    • The input column must not be a part of the primary key and must not have NULL values or empty strings.

    • There must be no backticks used in the DBName, TableName, or ColumnName and there must be no period used in the DBName or TableName.

  • OutputTableColumn: specifies the names of the database, table, and column where the generated embeddings are stored. The OutputTableColumn is specified in the following format: DBName.TableName.ColumnName.

    • The specified output table must be an internal table.

    • If the specified output table already exists, then it must be the same as the input table. And, the specified output column must not already exist in the input table. A new VECTOR column is added to the table. External tables are read only. So if input table is an external table, then it cannot be used to store the output.

    • If the specified output table doesn't exist, then a new table is created. The new output table has key columns which contains the same primary key values as the input table and a VECTOR column that stores the generated embeddings.

    • There must be no backticks used in the DBName, TableName, or ColumnName and there must be no period used in the DBName or TableName.

  • options: specifies optional parameters as key-value pairs in JSON format. It can include the following parameters:

    • model_id: specifies the embedding model to use for encoding the text. Default value is multilingual-e5-small. Possible values are:

      • all_minilm_l12_v2 or minilm

      • multilingual-e5-small

      To view the lists of available embedding models, see In-Database Embedding Model.

    • truncate: specifies whether to truncate inputs longer than the maximum token size. Default value is true.

    • batch_size: specifies the batch size for the routine. This option is supported for internal tables only. Default value is 1000. Possible values are integer values between 1 and 1000.

    • details_column: specifies a name for the output table column that is created for adding details of errors encountered for rows that aren't processed successfully by the routine. Ensure that a column by the specified name does not already exist in the table. Default value is details.

Syntax Examples

Generate embeddings for text stored in demo_db.input_table.Input using the all_minilm_l12_v2 embedding model, and save the generated embeddings in the output table demo_db.output_table.Output:

mysql> CALL sys.ML_EMBED_TABLE("demo_db.input_table.Input", "demo_db.output_table.Output", JSON_OBJECT("model_id", "all_minilm_l12_v2"));