The VECTOR_STORE_LOAD
routine generates
vector embedding for the specified files or folders that are
stored in the Object Storage bucket, and loads the embeddings
into a new vector store table.
This routine creates an asynchronous task which loads vector store tables in the background. It also returns a query that you can run to track the status of the vector store load task that is running in the background.
It is recommended that you create five or less vector store load tasks at a time. Too many tasks running at the same time might cause overloading issues.
mysql> call sys.VECTOR_STORE_LOAD('URI', [options]);
options: {
JSON_OBJECT('key','value'[,'key','value'] ...)
'key','value': {
['formats', JSON_ARRAY('Format1'[,'Format2'] ...)]
['schema_name', 'SchemaName']
['table_name', 'TableName']
['region', 'Region']
['task_name', 'TaskName']
['language', 'Language']
['description', 'Description']
['uris', JSON_ARRAY(JSON_OBJECT('uri','URI1','table_name','TableName1')[,JSON_OBJECT('uri','URI2','table_name','TableName2')] ...)]
['autopilot_disable_check', {true|false}]
}
}
Following are VECTOR_STORE_LOAD
parameters:
-
URI
: specifies the unique reference index (URI) or pre-authenticated request (PAR) of the Object Storage bucket files or folders to be ingested into the vector store.A URI is considered to be one of the following:
A glob pattern, if it contains at least one unescaped
?
or*
character.A prefix, if it is not a pattern and ends with a
/
character like a folder path.A file path, if it is neither a glob pattern nor a prefix.
To learn how to create PAR for your object storage, see Creating a PAR request in Object Storage.
-
options
: specifies optional parameters as key-value pairs in JSON format. It can include the following parameters:-
formats
: specifies the list of formats to be loaded. The supported file formats arepdf
,ppt
,txt
,html
, anddoc
. By default, the routine uses all the supported formats.If the routine detects multiple files with the same or different file formats in a single load, it creates a separate table for every format it finds. The table name for each format is the specified or default table name followed by the format:
TableName_Format
. schema_name
: specifies the name of the schema where the vector embeddings are to be loaded. By default, this procedure uses the current schema from the session.table_name
: specifies the name of the vector store table to create. By default, the routine generates a unique table name with formatvector_store_data_x
, wherex
is a counter.region
: specifies the region of the bucket. Default value is the region where the current DB System is running.task_name
: specifies a name for the loading task to be mentioned in the task status. Default value isVector Store Loader
.-
language
: specifies the text content language used in the files to be ingested into the vector store. To set the value of thelanguage
parameter, use the two-letterISO 639-1
code for the language. This parameter is supported in HeatWave9.0.1-u1
and later versions.Default value is
en
.For possible values, to view the list of supported languages, see Languages.
description
: specifies a description of document collection being loaded to be mentioned in the task status. Default value isNULL
.-
uris
: specifies a list of additional URIs to include along with an optional name for the vector store table to be created for the specified URI. Default value isNULL
.This parameter accepts the following values:
uri
: specifies the additional URI. If onlyuri
is provided, the routine uses the specified URI as an additional URI, and loads it into the main tableoptions.table_name
or the generated table with the unique table name.-
table_name
: if onlytable_name
is provided, the routine loads the specified vector store table into HeatWave.If both
uri
andtable_name
are provided, the routine loads the specified URI into the specified table.
autopilot_disable_check
: if set totrue
, turns autopilot checks off, so HeatWave GenAI does not perform checks when the load task is running. Therefore, checks such as memory checks for estimating memory needed for tables and checking memory capacity in HeatWave for the load to complete are not performed. Default value isfalse
.
-
-
Specifying the file to ingest using the URI in
VECTOR_STORE_LOAD
:call sys.VECTOR_STORE_LOAD('oci://demo_bucket@demo_namespace/demo_folder/demo_file.pdf', '{"table_name": "demo_embeddings"}');
-
Specifying the file to ingest using the PAR in
VECTOR_STORE_LOAD
:call sys.VECTOR_STORE_LOAD('https://demo.objectstorage.us-ashburn-1.oci.customer-oci.com/p/demo-url/n/demo/b/demo-bucket/o/demo_file.pdf', '{"table_name": "demo_embeddings"}');
-
Tracking the progress of a load task by running the task query displayed as output for the
VECTOR_STORE_LOAD
routine:select id, name, message, progress, status, scheduled_time,estimated_completion_time, estimated_remaining_time, progress_bar FROM mysql_task_management.task_status WHERE id=1\G
The output looks similar to the following:
id: 1 name: Vector Store Loader message: Task starting. progress: 0 status: RUNNING scheduled_time: 2024-07-02 14:42:38 estimated_completion_time: 2024-07-22 10:19:53 estimated_remaining_time: 52.50000 progress_bar: __________
-
Getting more details about the load task by querying the task logs for the given task id:
select * from mysql_task_management.task_log where task_id = 1;
The output looks similar to the following:
+------------------------------------+---------+----------------------------+------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+-----------+ | id | task_id | log_time | message | data | progress | status | +------------------------------------+---------+----------------------------+------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+-----------+ | 0x11EF799F5D99054288CC020017091C01 | 1 | 2024-09-23 11:31:24.884514 | Task created by user. | NULL | 0 | SCHEDULED | | 0x11EF799F5D99812188CC020017091C01 | 1 | 2024-09-23 11:31:24.887685 | Task starting. | {"engine_attributes": "{\"demo_embeddings\":\"'{\\n\\\"dialect\\\": {\\\"format\\\": \\\"pdf\\\", \\\"language\\\": \\\"en\\\", \\\"is_strict_mode\\\": false },\\n\\\"file\\\": [\\n{\\n \\\"name\\\": \\\"demo_folder/demo_file.pdf\\\",\\n \\\"bucket\\\": \\\"demo_bucket\\\",\\n \\\"region\\\": \\\"us-ashburn-1\\\",\\n \\\"namespace\\\": \\\"demo_namespace\\\"\\n}\\n]}'\"}"} | 0 | RUNNING | | 0x11EF799F6390FD9788CC020017091C01 | 1 | 2024-09-23 11:31:34.898219 | Loading in progress... | {"tables_to_load": "[{\"table_name\": \"demo_embeddings\", \"load_progress\": 10.0}]"} | 10 | RUNNING | | 0x11EF799F668C172F88CC020017091C01 | 1 | 2024-09-23 11:31:39.899271 | Loading in progress... | {"tables_to_load": "[{\"table_name\": \"demo_embeddings\", \"load_progress\": 40.0}]"} | 40 | RUNNING | | 0x11EF799F6987348588CC020017091C01 | 1 | 2024-09-23 11:31:44.900419 | Loading in progress... | {"tables_to_load": "[{\"table_name\": \"demo_embeddings\", \"load_progress\": 40.0}]"} | 40 | RUNNING | | 0x11EF799F6C82547D88CC020017091C01 | 1 | 2024-09-23 11:31:49.901634 | Loading in progress... | {"tables_to_load": "[{\"table_name\": \"demo_embeddings\", \"load_progress\": 40.0}]"} | 40 | RUNNING | +------------------------------------+---------+----------------------------+------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+-----------+