This section describes how to generate vector embeddings using the
ML_EMBED_ROW
HeatWave GenAI routine.
Vector embeddings are a numerical representation of the text that
capture the semantics of the data and relationships to other data.
You can pass the text string in the routine manually or use data
from tables in your database. To embed multiple rows of text
stored in a table in a single run, you can even
run a batch query.
Using this method, you can create vector embedding tables that you
can use to perform similarity searches using the
DISTANCE()
function, without
setting up a vector store.
This method does not support embedding unstructured data. To learn how to create vector embeddings for unstructured data, see Section 4.4, “Performing a Vector Search”.
Review the Requirements.
Connect to your HeatWave Database System.
For Running Batch Queries, add the text that you want to embed to a column in a new or existing table.
To generate a vector embedding using HeatWave GenAI, perform the following steps:
-
To define the text that you want to encode, set the
@text
variable:set @text="TextToEncode";
Replace
TextToEncode
with the text that you want to encode. For example:set @text="HeatWave GenAI is a feature of HeatWave that lets you communicate with unstructured data in HeatWave using natural-language queries.";
-
To generate a vector embedding for the specified text, pass the text to the embedding model using the
ML_EMBED_ROW
routine:select sys.ML_EMBED_ROW(@text, JSON_OBJECT("model_id", "EmbeddingModel")) into @text_embedding;
Replace
EmbeddingModel
with ID of the embedding model you want to use. To view the lists of supported models, see HeatWave In-Database Embedding Models and OCI Generative AI Service Embedding Models.For example:
select sys.ML_EMBED_ROW(@text, JSON_OBJECT("model_id", "all_minilm_l12_v2")) into @text_embedding;
The routine returns a
VECTOR
, and this commad stores it in the@text_embedding
variable. -
Print the vector embedding stored in the
@text_embedding
variable:select @text_embedding;
The output, which is a binary representation of the specified text, looks similar to the following:
-----------------------------------------------------+ | 0xEB0FE93C21737D3C4ED2F0BC6DCC06BD0668DBBB15D1ABBBDF3E03BD09DC21BC229512BD06B602BD4824F6BCFFEF70BDFFB53C3D98A2BB3C02E0D63DD3C79BBC8A5B45BD29B32F3C63040DBD17AE91BDECEEED3C3B8A003D05E03CBB42B2203D9F41643D404E4BBDEA4B93BBACED3D3DA0C2C43C8D2B55BCFED9103E280EA93D8424503CEE8A143D7A9BD03C0553AFBD203443BD9276E23C024D15BDE8A6C83D8FD2E9BC401DA33AD661AFBDCA06633CEBB1823D11335C3D14B292BDE38028BB335F273B55BA053D3472BFBDF60A64BD4B91B3BC6039BE3D37A5763DA85A863D4D08363B3F1E6DBD732F74BDA3BA0B3C0B436BBC4E4081BDBD4866BDB4F1AB3C9E0692BCAE0160BB6D782D3DD321B83DC521813DE090D7BD8A96EEBC334FFDBCD0F146BD17217D3DAD8A333C3B427A3DE93EDC3C673712BDE8B21FBC52011A3A3A058A3D0BC534BD8A19AB3B8C1B493C731CB03BDC0894BD6E7180BD44AC2DBC5131B03C108D8B3D61E13FBCB2B69A3C10423B3DFDAC73BD33883A3DEB357E3D3C27363D34599BBD5D9A523D63441D3D95F8EFBC7D74CE3BD420AEBD5A7B313CB1A6BDBCEAFAA53C07A01EBD02FF373D8912B73C4E16D5BD00EC99BD50A0ED3B121129BB44AB1D3D56CA8A3D2BF183BD500A653C37F5B1BB2EEA3DBD443FB13DE5FD6E3C0ACC27BDF4F724BD73EB193D32A2773D37FCA8BDA279D0BA13730EBC94238A3DBD72D13DC3072B3D0FF8F33A8404B53DFFC7983D79FD213DB5A0E7BBDBE31A3D7C7A8DBDFDA55CBA99FEA73D28E9F1BC87C1AD3D7934563BD80E4EBD51ED65BCE5199DBC122A08BD7B35A43B344EADBD07E41C3D028CA13D6385933CB2A623BDA866093DE4BE843D48F772B8A9DA973D152E733D55FF363D56AAC33C0F18703D1F39043D45AB0D3CA7992C3C9053F3BDFE1718BEB199F43C579F8F3D7489B1BDCE51B6BDA864CDBC8BC8B13D681E83BD7EFB0ABD01A5E8BD7B5949BDFAEA713DAADBE7BC417FACBD5050A33D8BBC263C560A3BBD96AC13BD4E5D19BC3691C13D5F78FEBC383990BDED5E65BD12AD23BD139B883CB678763B4A943F3B3CF6DB3CC0FF0B3DC45AA13C00D893BCCC560A3D2EC4EBBABF3F10BD8FE40F3C1FEE8EBD80829B3C10F186BDAA0A1E3D7245EE3CDFB41BBDC0BD08BC209B813D337860BD7D5621BDBD5850BCDEB93B3DD6ED30BD1E526DBBF98F703DB2884ABC35A25B3C9895213C6040EEBC8DA4473C6690083EE09C2EB77D972FBD60B7180A69C488BCAE6529BDF98CABBD175DCA3D85BA56BDE540C73CE677F43CB382C03D4C79A63D10E2163CF3EB3ABDCED222BCFF398BBC9E268BBD669E93BD68A7383CF162293D4E138C3CB3E0C13DB96E243D7DD984BDD3C3ABBC818188BD7DA8593B679E623DADE31D3D01DED53C39277ABD2460B4BC4ABFDEBCA7224EBD5F9C233AB77405BC8B532DBDEE7F15BCFD05C43D1D5E953D1FD8B3BCF1EBED3C1BAC3BBCC9FF2B3D097588BA2F14FB3B5BB6383DF85C71BD7A877A3D2542C3BD53824E3D576B9FBD5C247BBD9AD0C03DEB77F7BCA5804B3DCFF8A7BCBAFF893D6EDBACBDBB20A83C24884BBDDF61C0BD527A92BD4822583D99687EBDFD682DBDE80E65BD717093BC46A3993D6A4F143E15EE8FBC2A81A7BDF811EFBCFBCF883D3F739CBC2BC0863DFBBADDBCC72B823C9B9CE2BC631448BD203017BD853B94BD42BC723D4A1807BBB888753C99A10B3D5390613CAF9E623DC7F5A5BDF4A1653CA17EF6BBBFAC35BC17394BBB3DF39A3D7ACCD83D0DEACCBDEF29C3BC2814633DBD40870BC6CFDE3C759CE9BC754AD03CA0849D3D64CE9B3C5569293CD36D57BDE9A267BA6CFA503D00BB36BDAE16833C01188DBDA0737B3DDB913C3C0FD08FBD8C9DF6BC0D2CE23CFE0771BCC4F6583D9E730EBD6E8CA4BCBECB8C3DE1F09B3B3D4034BBE53806BD92AE80BCF2F8D7BC2385493C679C493DC2324CBD266765BC4FC494BC79364C3D8AB7303DE0E4B8BCBDEF903DCE3DD6BC220727BC589FC2BC22A94DBDAA82983BC78C8F3DCB6AC2BA31089DBCEB49FC3C2510EC3B3F05083D90DDDBBDA1E15ABD80362E3D9936A3BDF401D03C6CDB893DF3684ABDFB2DB7BB76ED95BD03A3FE3C2371B5BDBD5F7EBB1AACF7BB156BE53C1E8A9BBC440E2CBDFF1B99BD |
To encode multiple rows of text strings stored in a table
column, in parallel, use the
ML_EMBED_TABLE
routine. This
method is faster than running the
ML_EMBED_ROW
routine multiple times.
To alter an existing table or create a new table, MySQL
requires you to set the
sql-require-primary-key
system variable to 0
.
The ML_EMBED_TABLE
routine is supported in
MySQL 9.0.1-u1 and later versions.
-
Run batch queries using
ML_EMBED_TABLE
:call sys.ML_EMBED_TABLE("InputDBName.InputTableName.InputColumn", "OutputDBName.OutputTableName.OutputColumn", JSON_OBJECT("model_id", "EmbeddingModel"));
Replace the following:
InputDBName
: the name of the database that contains the table column where your input queries are stored.InputTableName
: the name of the table that contains the column where your input queries are stored.InputColumn
: the name of the column that contains input queries.OutputDBName
: the name of the database that contains the table where you want to store the generated outputs. This can be the same as the input database.OutputTableName
: the name of the table where you want to create a new column to store the generated outputs. This can be the same as the input table. If the specified table doesn't exist, a new table is created.OutputColumn
: the name for the new column where you want to store the output generated for the input queries.EmbeddingModel
: ID of the embedding model to use. To view the lists of supported models, see HeatWave In-Database Embedding Models and OCI Generative AI Service Embedding Models.
For example:
call sys.ML_EMBED_TABLE("demo_db.input_table.Input", "demo_db.output_table.Output", JSON_OBJECT("model_id", "all_minilm_l12_v2"));