HeatWave User Guide  /  HeatWave GenAI  /  Generating Vector Embeddings

4.6 Generating Vector Embeddings

This section describes how to generate vector embeddings using the ML_EMBED_ROW HeatWave GenAI routine. Vector embeddings are a numerical representation of the text that capture the semantics of the data and relationships to other data. You can pass the text string in the routine manually or use data from tables in your database. To embed multiple rows of text stored in a table in a single run, you can even run a batch query.

Using this method, you can create vector embedding tables that you can use to perform similarity searches using the DISTANCE() function, without setting up a vector store.

Note

This method does not support embedding unstructured data. To learn how to create vector embeddings for unstructured data, see Section 4.4, “Performing a Vector Search”.

Before You Begin

  • Connect to your HeatWave Database System.

  • For Running Batch Queries, add the text that you want to embed to a column in a new or existing table.

Generating a Vector Embedding for Specified Text

To generate a vector embedding using HeatWave GenAI, perform the following steps:

  1. To define the text that you want to encode, set the @text session variable:

    set @text="TextToEncode";

    Replace TextToEncode with the text that you want to encode. For example:

    set @text="HeatWave GenAI is a feature of HeatWave that lets you communicate with unstructured data in HeatWave using natural-language queries.";
  2. To generate a vector embedding for the specified text, pass the text to the embedding model using the ML_EMBED_ROW routine:

    select sys.ML_EMBED_ROW(@text, JSON_OBJECT("model_id", "EmbeddingModel")) into @text_embedding;

    Replace EmbeddingModel with ID of the embedding model you want to use. To view the lists of supported models, see HeatWave In-Database Embedding Models and OCI Generative AI Service Embedding Models.

    For example:

    select sys.ML_EMBED_ROW(@text, JSON_OBJECT("model_id", "all_minilm_l12_v2")) into @text_embedding;

    The routine returns a VECTOR, and this commad stores it in the @text_embedding session variable.

  3. Print the vector embedding stored in the @text_embedding session variable:

    select @text_embedding;

    The output, which is a binary representation of the specified text, looks similar to the following:

      -----------------------------------------------------+
      | 0xEB0FE93C21737D3C4ED2F0BC6DCC06BD0668DBBB15D1ABBBDF3E03BD09DC21BC229512BD06B602BD4824F6BCFFEF70BDFFB53C3D98A2BB3C02E0D63DD3C79BBC8A5B45BD29B32F3C63040DBD17AE91BDECEEED3C3B8A003D05E03CBB42B2203D9F41643D404E4BBDEA4B93BBACED3D3DA0C2C43C8D2B55BCFED9103E280EA93D8424503CEE8A143D7A9BD03C0553AFBD203443BD9276E23C024D15BDE8A6C83D8FD2E9BC401DA33AD661AFBDCA06633CEBB1823D11335C3D14B292BDE38028BB335F273B55BA053D3472BFBDF60A64BD4B91B3BC6039BE3D37A5763DA85A863D4D08363B3F1E6DBD732F74BDA3BA0B3C0B436BBC4E4081BDBD4866BDB4F1AB3C9E0692BCAE0160BB6D782D3DD321B83DC521813DE090D7BD8A96EEBC334FFDBCD0F146BD17217D3DAD8A333C3B427A3DE93EDC3C673712BDE8B21FBC52011A3A3A058A3D0BC534BD8A19AB3B8C1B493C731CB03BDC0894BD6E7180BD44AC2DBC5131B03C108D8B3D61E13FBCB2B69A3C10423B3DFDAC73BD33883A3DEB357E3D3C27363D34599BBD5D9A523D63441D3D95F8EFBC7D74CE3BD420AEBD5A7B313CB1A6BDBCEAFAA53C07A01EBD02FF373D8912B73C4E16D5BD00EC99BD50A0ED3B121129BB44AB1D3D56CA8A3D2BF183BD500A653C37F5B1BB2EEA3DBD443FB13DE5FD6E3C0ACC27BDF4F724BD73EB193D32A2773D37FCA8BDA279D0BA13730EBC94238A3DBD72D13DC3072B3D0FF8F33A8404B53DFFC7983D79FD213DB5A0E7BBDBE31A3D7C7A8DBDFDA55CBA99FEA73D28E9F1BC87C1AD3D7934563BD80E4EBD51ED65BCE5199DBC122A08BD7B35A43B344EADBD07E41C3D028CA13D6385933CB2A623BDA866093DE4BE843D48F772B8A9DA973D152E733D55FF363D56AAC33C0F18703D1F39043D45AB0D3CA7992C3C9053F3BDFE1718BEB199F43C579F8F3D7489B1BDCE51B6BDA864CDBC8BC8B13D681E83BD7EFB0ABD01A5E8BD7B5949BDFAEA713DAADBE7BC417FACBD5050A33D8BBC263C560A3BBD96AC13BD4E5D19BC3691C13D5F78FEBC383990BDED5E65BD12AD23BD139B883CB678763B4A943F3B3CF6DB3CC0FF0B3DC45AA13C00D893BCCC560A3D2EC4EBBABF3F10BD8FE40F3C1FEE8EBD80829B3C10F186BDAA0A1E3D7245EE3CDFB41BBDC0BD08BC209B813D337860BD7D5621BDBD5850BCDEB93B3DD6ED30BD1E526DBBF98F703DB2884ABC35A25B3C9895213C6040EEBC8DA4473C6690083EE09C2EB77D972FBD60B7180A69C488BCAE6529BDF98CABBD175DCA3D85BA56BDE540C73CE677F43CB382C03D4C79A63D10E2163CF3EB3ABDCED222BCFF398BBC9E268BBD669E93BD68A7383CF162293D4E138C3CB3E0C13DB96E243D7DD984BDD3C3ABBC818188BD7DA8593B679E623DADE31D3D01DED53C39277ABD2460B4BC4ABFDEBCA7224EBD5F9C233AB77405BC8B532DBDEE7F15BCFD05C43D1D5E953D1FD8B3BCF1EBED3C1BAC3BBCC9FF2B3D097588BA2F14FB3B5BB6383DF85C71BD7A877A3D2542C3BD53824E3D576B9FBD5C247BBD9AD0C03DEB77F7BCA5804B3DCFF8A7BCBAFF893D6EDBACBDBB20A83C24884BBDDF61C0BD527A92BD4822583D99687EBDFD682DBDE80E65BD717093BC46A3993D6A4F143E15EE8FBC2A81A7BDF811EFBCFBCF883D3F739CBC2BC0863DFBBADDBCC72B823C9B9CE2BC631448BD203017BD853B94BD42BC723D4A1807BBB888753C99A10B3D5390613CAF9E623DC7F5A5BDF4A1653CA17EF6BBBFAC35BC17394BBB3DF39A3D7ACCD83D0DEACCBDEF29C3BC2814633DBD40870BC6CFDE3C759CE9BC754AD03CA0849D3D64CE9B3C5569293CD36D57BDE9A267BA6CFA503D00BB36BDAE16833C01188DBDA0737B3DDB913C3C0FD08FBD8C9DF6BC0D2CE23CFE0771BCC4F6583D9E730EBD6E8CA4BCBECB8C3DE1F09B3B3D4034BBE53806BD92AE80BCF2F8D7BC2385493C679C493DC2324CBD266765BC4FC494BC79364C3D8AB7303DE0E4B8BCBDEF903DCE3DD6BC220727BC589FC2BC22A94DBDAA82983BC78C8F3DCB6AC2BA31089DBCEB49FC3C2510EC3B3F05083D90DDDBBDA1E15ABD80362E3D9936A3BDF401D03C6CDB893DF3684ABDFB2DB7BB76ED95BD03A3FE3C2371B5BDBD5F7EBB1AACF7BB156BE53C1E8A9BBC440E2CBDFF1B99BD |

Running Batch Queries

To encode multiple rows of text strings stored in a table column, in parallel, use the ML_EMBED_TABLE routine. This method is faster than running the ML_EMBED_ROW routine multiple times.

Note

The ML_EMBED_TABLE routine is supported in MySQL 9.0.1-u1 and later versions.

  • Run batch queries using ML_EMBED_TABLE:

    call sys.ML_EMBED_TABLE("InputDBName.InputTableName.InputColumn", "OutputDBName.OutputTableName.OutputColumn", JSON_OBJECT("model_id", "EmbeddingModel"));

    Replace the following:

    • InputDBName: the name of the database that contains the table column where your input queries are stored.

    • InputTableName: the name of the table that contains the column where your input queries are stored.

    • InputColumn: the name of the column that contains input queries.

    • OutputDBName: the name of the database that contains the table where you want to store the generated outputs. This can be the same as the input database.

    • OutputTableName: the name of the table where you want to create a new column to store the generated outputs. This can be the same as the input table. If the specified table doesn't exist, a new table is created.

    • OutputColumn: the name for the new column where you want to store the output generated for the input queries.

    • EmbeddingModel: ID of the embedding model to use. To view the lists of supported models, see HeatWave In-Database Embedding Models and OCI Generative AI Service Embedding Models.

    For example:

    call sys.ML_EMBED_TABLE("demo_db.input_table.Input", "demo_db.output_table.Output", JSON_OBJECT("model_id", "all_minilm_l12_v2"));