The ML_EMBED_TABLE
routine runs multiple
embedding generations in a batch, in parallel.
In versions older than MySQL 9.2.1, to alter an existing table
or create a new table, MySQL requires you to set the
sql-require-primary-key
system variable to 0
.
This routine is available in MySQL 9.0.1-u1 and later versions.
To learn about the privileges you need to run this routine, see Required Privileges.
Press CTRL+C to copymysql> call sys.ML_EMBED_TABLE('InputTableColumn', 'OutputTableColumn', [options]); options: { JSON_OBJECT('key','value'[,'key','value'] ...) 'key','value': { ['model_id', {'all_minilm_l12_v2'|'minilm'|'multilingual-e5-small'|'cohere.embed-english-v3.0'|'cohere.embed-multilingual-v3.0'}] ['truncate', {true|false}] ['batch_size', BatchSize] } }
Following are ML_EMBED_TABLE
parameters:
-
InputTableColumn
: specifies the names of the input database, table, and column that contains the text to encode. TheInputTableColumn
is specified in the following format:DBName
.TableName
.ColumnName
.The specified input table can be an internal or external table.
The specified input table must already exist, must not be empty, and must have a primary key.
The input column must already exist and must contain
text
orvarchar
values.The input column must not be a part of the primary key and must not have
NULL
values or empty strings.There must be no backticks used in the
DBName
,TableName
, orColumnName
and there must be no period used in theDBName
orTableName
.
-
OutputTableColumn
: specifies the names of the database, table, and column where the generated embeddings are stored. TheOutputTableColumn
is specified in the following format:DBName
.TableName
.ColumnName
.The specified output table must be an internal table.
If the specified output table already exists, then it must be the same as the input table. And, the specified output column must not already exist in the input table. A new
VECTOR
column is added to the table. External tables are read only. So if input table is an external table, then it cannot be used to store the output.If the specified output table doesn't exist, then a new table is created. The new output table has key columns which contains the same primary key values as the input table and a
VECTOR
column that stores the generated embeddings.There must be no backticks used in the
DBName
,TableName
, orColumnName
and there must be no period used in theDBName
orTableName
.
-
options
: specifies optional parameters as key-value pairs in JSON format. It can include the following parameters:-
model_id
: specifies the embedding model to use for encoding the text. Default value isall_minilm_l12_v2
. Possible values are:-
all_minilm_l12_v2
orminilm
:As of MySQL 9.2.1, for encoding text or files in any supported language.
In previous versions of MySQL, for encoding text or files in English only.
-
multilingual-e5-small
:as of MySQL 9.2.1, for encoding text or files in any supported language.
In previous versions of MySQL, for encoding text or files in supported languages other than English.
This embedding model is available in MySQL 9.0.1-u1 and later versions.
-
cohere.embed-english-v3.0
:As of MySQL 9.2.1, for encoding text or files in any supported language.
In previous versions of MySQL, for encoding text or files in English only.
This embedding model is available in MySQL 9.0.1-u1 and later versions.
-
cohere.embed-multilingual-v3.0
:As of MySQL 9.2.1, for encoding text or files in any supported language.
In previous versions of MySQL, for encoding text or files in supported languages other than English.
This embedding model is available in MySQL 9.0.1-u1 and later versions.
To view the lists of available embedding models, see HeatWave In-Database Embedding Models and OCI Generative AI Service Embedding Models. To view the list of supported languages, see Languages.
-
truncate
: specifies whether to truncate inputs longer than the maximum token size. Default value istrue
.batch_size
: specifies the batch size for the routine. This parameter is supported for internal tables only. Default value is1000
. Possible values are integer values between1
and1000
.
-
Consider the following input table
demo_db.input_table
:
Press CTRL+C to copymysql> select * from demo_db.input_table; +----+----------------------------------+ | id | Input | +----+----------------------------------+ | 1 | What is artificial intelligence? | | 2 | What is MySQL? | +----+----------------------------------+
Generate embeddings for text stored in
demo_db.input_table.Input
using the
all_minilm_l12_v2
embedding model, and save
the generated embeddings in the output table
demo_db.output_table.Output
:
Press CTRL+C to copymysql> call sys.ML_EMBED_TABLE("demo_db.input_table.Input", "demo_db.output_table.Output", JSON_OBJECT("model_id", "all_minilm_l12_v2"));
The output table contains the following fields:
Press CTRL+C to copymysql> describe demo_db.output_table; +--------+--------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +--------+--------------+------+-----+---------+-------+ | id | int | NO | PRI | NULL | | | Output | vector(2048) | NO | | NULL | | +--------+--------------+------+-----+---------+-------+
View the contents of the output table:
Press CTRL+C to copymysql> select * from demo_db.output_table| id | Output| 1 | 0x| | 2 | 0x| +----+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
As of MySQL 9.2.1, to specify the embedding model used to generate the vector embeddings, the routine adds the following comment for the VECTOR column in the output table:
Press CTRL+C to copy'GENAI_OPTIONS=EMBED_MODEL_ID=EmbeddingModelID'
For example:
Press CTRL+C to copymysql>SHOW CREATE TABLE output_table; +--------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +--------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | output_table | CREATE TABLE `output_table` ( `id` int NOT NULL DEFAULT '0', `Output` vector(2048) NOT NULL COMMENT 'GENAI_OPTIONS=EMBED_MODEL_ID=minilm', PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci | +--------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+