This section describes how to generate vector embeddings for files or folders stored in , and load the embeddings into a vector store table.
HeatWave GenAI supports the following methods to ingest files from the bucket:
Review the Requirements and Required Privileges.
-
If not already done, create an Oracle Cloud Infrastructure (OCI) bucket for storing files that you want to ingest into the vector store.
Then, upload the files to the bucket.
Vector store can ingest files in the following formats: PDF, PPT, TXT, HTML, and DOC.
-
Connect to your HeatWave Database System.
Press CTRL+C to copymysqlsh -uAdmin -pPassword -hPrivateIP --sqlc
NoteX protocol is not supported for Ingesting Files Using Asynchronous Load. To set up a vector store using this method, ensure that you use the classic MySQL protocol while connecting to the database.
Replace the following:
Admin
: the database system admin name.Password
: the database system password.PrivateIP
: the private IP address of the database system.
The
VECTOR_STORE_LOAD
routine creates and loads vector embeddings asynchronously
into the vector store. You can ingest the source files into
the vector store using the following methods:
This section describes how to load source documents from the bucket into the vector table using the uniform resource identifier (URI) of the object.
To use this method, you need to enable the database system to access an Oracle Cloud Infrastructure bucket. For more information, see Resource Principals.
To set up a new vector store using an object URI, perform the following steps:
-
To create the vector store table, use a new or existing database:
Press CTRL+C to copyuse DBName;
Replace
DBName
with the database name. -
If you are loading a vector store table on a database system for the first time, run the following command to create a new schema dedicated for task management:
Press CTRL+C to copyselect mysql_task_management_ensure_schema();
This creates a new schema,
mysql_task_management
, which keeps track of the tasks that run in the background. Themysql_task_management
schema contains internal tables that store the task details such as task ID and task logs. These internal tables must not be altered as it can cause the asynchronous task created byVECTOR_STORE_LOAD
to fail. -
Optionally, to specify a name for the vector store table and language to use, set the
@options
variable:Press CTRL+C to copyset @options = JSON_OBJECT("table_name", "VectorStoreTableName", "language", "Language");
Replace the following:
VectorStoreTableName
: the name you want for the vector store table.-
Language
: the two-letterISO 639-1
code for the language you want to use. Default language isen
, which is English. To view the list of supported languages, see Languages.The
language
parameter is supported in MySQL 9.0.1-u1 and later versions.
For example:
Press CTRL+C to copyset @options = JSON_OBJECT("table_name", "demo_embeddings", "language", "en");
In MySQL 9.1.2 and later versions, Optical Character Recognition (OCR) is enabled by default.
To learn more about the available routine options, see VECTOR_STORE_LOAD Syntax.
-
To ingest the file from the , create vector embeddings, and load the vector embeddings into HeatWave, use the
VECTOR_STORE_LOAD
routine:Press CTRL+C to copycall sys.VECTOR_STORE_LOAD("oci://BucketName@Namespace/Path/Filename", @options);
Replace the following:
BucketName
: the OCI Object Storage bucket name.Namespace
: the name of the bucket namespace.Path
: path to the folder that contains the source file.Filename
: the filename with the file extension.
For example:
Press CTRL+C to copycall sys.VECTOR_STORE_LOAD("oci://demo_bucket@demo_namespace/demo_folder/demo_file.pdf", @options);
This creates an asynchronous task that runs in background and loads the vector embeddings into the specified vector store table. The output of the
VECTOR_STORE_LOAD
routine contains the following:An ID of the task which was created.
A task query that you can use to track the progress of task.
If HeatWave GenAI detects multiple files with the same or different file formats in a single load, it creates a separate table for every format it finds. The table name for each format is the specified or default table name followed by the format. For example,
demo_embeddings_pdf
is the name of the table that contains PDF files. -
After the task is completed, verify that embeddings are loaded in the vector store table:
Press CTRL+C to copyselect count(*) from VectorStoreTableName;
For example:
Press CTRL+C to copyselect count(*) from demo_embeddings;
If you see a numerical value in the output, your embeddings are successfully loaded in the vector store table.
-
To view the details of the vector store table, use the following statement:
Press CTRL+C to copydescribe demo_embeddings;
The output shows the details of the table:
Press CTRL+C to copy+-------------------+---------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------------+---------------+------+-----+---------+-------+ | document_name | varchar(1024) | NO | | NULL | | | metadata | json | NO | | NULL | | | document_id | int unsigned | NO | PRI | NULL | | | segment_number | int unsigned | NO | PRI | NULL | | | segment | varchar(1024) | NO | | NULL | | | segment_embedding | vector(384) | NO | | NULL | | +-------------------+---------------+------+-----+---------+-------+
This section describes how to ingest source documents from the using pre-authenticated requests (PAR). Use this method if OCI bucket access is not enabled on your database system.
For confidential data, Using the Uniform Resource Identifier with Asynchronous Load is recommended for ingesting the source files into the vector store as it is a more secure method.
To set up a new vector store, perform the following steps:
-
To create the vector store table, use a new or existing database:
Press CTRL+C to copyuse DBName;
Replace
DBName
with the database name. -
If you are loading a vector store table on a database system for the first time, run the following command to create a new schema dedicated for task management:
Press CTRL+C to copyselect mysql_task_management_ensure_schema();
This creates a new schema,
mysql_task_management
, which keeps track of the tasks that run in the background. Themysql_task_management
schema contains internal tables that store the task details such as task ID and task logs. These internal tables must not be altered as it can cause the asynchronous task created byVECTOR_STORE_LOAD
to fail. -
Optionally, to specify a name for the vector store table and language to use, set the
@options
variable:Press CTRL+C to copyset @options = JSON_OBJECT("table_name", "VectorStoreTableName", "language", "Language");
Replace the following:
VectorStoreTableName
: the name you want for the vector store table.-
Language
: the two-letterISO 639-1
code for the language you want to use. Default language isen
, which is English. To view the list of supported languages, see Languages.The
language
parameter is supported in MySQL 9.0.1-u1 and later versions.
For example:
Press CTRL+C to copyset @options = JSON_OBJECT("table_name", "demo_embeddings", "language", "en");
In MySQL 9.1.2 and later versions, Optical Character Recognition (OCR) is enabled by default.
To learn more about the available routine options, see VECTOR_STORE_LOAD Syntax.
-
To ingest the file from the , create vector embeddings, and load the vector embeddings into HeatWave, use the
VECTOR_STORE_LOAD
routine:Press CTRL+C to copycall sys.VECTOR_STORE_LOAD("PAR", @options);
Replace
PAR
with PAR of the bucket, folder, or file that you want to use to set up the vector store.To learn how to create PAR for your , see Creating a PAR Request in .
NoteIf you are creating a PAR for a folder or Object Storage bucket, then select Enable Object Listing in the Create Pre-Authenticated Request dialog to enable object listing.
For example:
Press CTRL+C to copycall sys.VECTOR_STORE_LOAD("https://demo.objectstorage.us-ashburn-1.oci.customer-oci.com/p/demo-url/n/demo/b/demo-bucket/o/demo_file.pdf", @options);
This creates a task that runs in background and loads the vector embeddings into the specified vector store table. The output of the
VECTOR_STORE_LOAD
routine contains the following:An ID of the task which was created.
A task query that you can use to track the progress of task.
If HeatWave GenAI detects files with different file formats in a single load, it creates a separate table for every format it finds. The table name for each format is the specified or default table name followed by the format. For example,
demo_embeddings_pdf
is the name of the table that contains PDF files. -
After the task is completed, verify that embeddings are loaded in the vector store table:
Press CTRL+C to copyselect count(*) from VectorStoreTableName;
For example:
Press CTRL+C to copyselect count(*) from demo_embeddings;
If you see a numerical value in the output, your embeddings are successfully loaded in the vector store table.
-
To view the details of the vector store table, use the following statement:
Press CTRL+C to copydescribe demo_embeddings;
The output shows the details of the table:
Press CTRL+C to copy+-------------------+---------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------------+---------------+------+-----+---------+-------+ | document_name | varchar(1024) | NO | | NULL | | | metadata | json | NO | | NULL | | | document_id | int unsigned | NO | PRI | NULL | | | segment_number | int unsigned | NO | PRI | NULL | | | segment | varchar(1024) | NO | | NULL | | | segment_embedding | vector(384) | NO | | NULL | | +-------------------+---------------+------+-----+---------+-------+
The
HEATWAVE_LOAD
routine creates and loads vector embeddings into the vector
store using auto parallel load.
To ingest files using the
HEATWAVE_LOAD
routine, perform the following steps:
-
In your HeatWave Database System, create and use a new database:
Press CTRL+C to copycreate database DBName; use DBName;
Replace
DBName
with the name you want for the new database. -
To ingest the file from Object Storage and create vector embeddings in a new vector store table, set the
@dl_tables
variable:Press CTRL+C to copyset @dl_tables = '[ { "db_name": "DBName", "tables": [ { "table_name": "VectorStoreTableName", "engine_attribute": { "dialect": {"format": "FileFormat", "language": "Language"}, "file": [ {"par": "PAR"} ] } } ] }]';
Replace the following:
DBName
: the database name.VectorStoreTableName
: the name you want for the vector store table where the vector embeddings are stored.FileFormat
: the formats of the files to be ingested into the vector store table. The supported file formats arehtml
,pdf
,ppt
,pptx
txt
,doc
, anddocx
. To ingest multiple files with different unstructured data file formats into the vector store table in a single load, replaceFileFormat
withauto_unstructured
.-
Language
: the two-letterISO 639-1
code for the language you want to use. Default language isen
, which is English. To view the list of supported languages, see Languages.The
language
parameter is supported in MySQL 9.0.1-u1 and later versions. -
PAR
: the pre-authenticated request (PAR) detail of the bucket, folder, or file that you want to use to set up the vector store.To learn how to create PAR for your , see Creating a PAR request in .
NoteIf you are creating a PAR for a folder or Object Storage bucket, then select Enable Object Listing to enable object listing in the Create Pre-Authenticated Request dialog while creating the PAR.
In MySQL 9.1.2 and later versions, Optical Character Recognition (OCR) is enabled by default.
In MySQL 9.1.0 and 9.1.1, to enable OCR, set the
ocr
parameter totrue
.For example:
Press CTRL+C to copyset @dl_tables = '[ { "db_name": "demo_db", "tables": [ { "table_name": "demo_embeddings", "engine_attribute": { "dialect": {"format": "pdf", "language": "en"}, "file": [ {"par": "https://demo.objectstorage.us-ashburn-1.oci.customer-oci.com/p/demo-url/n/demo/b/demo_bucket/o/demo_file.pdf"} ] } } ] }]';
-
To prepare for loading the vector embeddings into the HeatWave system, set the
@options
variable:Press CTRL+C to copyset @options = JSON_OBJECT('mode', 'normal');
-
To load the vector embeddings into HeatWave, use the
HEATWAVE_LOAD
routine:Press CTRL+C to copycall sys.HEATWAVE_LOAD(CAST(@dl_tables AS JSON), @options);
This creates and stores the vector embeddings in the specified vector store table.
-
Verify that embeddings are loaded in the vector store table:
Press CTRL+C to copyselect count(*) from VectorStoreTableName;
For example:
Press CTRL+C to copyselect count(*) from demo_embeddings;
If you see a numerical value in the output, your embeddings are successfully loaded in the table.
-
To view the details of the vector store table, use the following statement:
Press CTRL+C to copydescribe demo_embeddings;
The output shows the details of the table:
Press CTRL+C to copy+-------------------+---------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------------+---------------+------+-----+---------+-------+ | document_name | varchar(1024) | NO | | NULL | | | metadata | json | NO | | NULL | | | document_id | int unsigned | NO | PRI | NULL | | | segment_number | int unsigned | NO | PRI | NULL | | | segment | varchar(1024) | NO | | NULL | | | segment_embedding | vector(384) | NO | | NULL | | +-------------------+---------------+------+-----+---------+-------+