The VECTOR_STORE_LOAD
routine generates
vector embedding for the specified files or folders that are ,
and loads the embeddings into a new vector store table.
This topic contains the following sections:
To learn about the privileges you need to run this routine, see Section 5.3, “Required Privileges for using GenAI”.
mysql> CALL sys.VECTOR_STORE_LOAD('URI'[, options]);
options: JSON_OBJECT(keyvalue[, keyvalue]...)
keyvalue:
{
'format', 'Format'
|'schema_name', 'SchemaName'
|'table_name', 'TableName'
|'language', 'Language'
|'embed_model_id', 'ModelID'
|'description', 'Description'
|'ocr', {true|false}
}
Following are VECTOR_STORE_LOAD
parameters:
-
URI
: specifies the unique reference index (URI) of the files or folders to be ingested into the vector store.A URI is considered to be one of the following:
A glob pattern, if it contains at least one unescaped
?
or*
character.A prefix, if it is not a pattern and ends with a
/
character like a folder path.A file path, if it is neither a glob pattern nor a prefix.
-
options
: specifies optional parameters as key-value pairs in JSON format. It can include the following parameters:format
: specifies the format of files to be loaded. Default value isauto_unstructured
, which means all supported types of files are loaded. Possible values arepdf
,pptx
,ppt
,txt
,html
,docx
,doc
, andauto_unstructured
.schema_name
: specifies the name of the schema where the vector embeddings are to be loaded. By default, this procedure uses the current schema from the session.table_name
: specifies the name of the vector store table to create. By default, the routine generates a unique table name with formatvector_store_data_x
, wherex
is a counter.-
language
: specifies the text content language used in the files to be ingested into the vector store. To set the value of thelanguage
parameter, use the two-letterISO 639-1
code for the language.Default value is
en
.For possible values, to view the list of supported languages, see Section 5.4, “Supported LLM, Embedding Model, and Languages”.
-
embed_model_id
: specifies the embedding model to use for encoding the text. Default value ismultilingual-e5-small
.For possible values, to view the list of available embedding models, see In-Database Embedding Model.
description
: specifies a description of document collection being loaded. Default value isNULL
.ocr
: specifies whether to enable or disable Optical Character Recognition (OCR). If set tofalse
, disables OCR. Default value istrue
, which means OCR is enabled by default. Default value istrue
.
-
Specifying the file to ingest, using the current database, auto-generated name for the vector store table, and default values for all options:
mysql> CALL sys.VECTOR_STORE_LOAD('file:///var/lib/mysql-files/demo-directory/heatwave-en.pdf', NULL);
-
Specifying the file to ingest, using the current database, and specifying the name of the vector store table to be created:
mysql> CALL sys.VECTOR_STORE_LOAD('file:///var/lib/mysql-files/demo-directory/heatwave-en.pdf', '{"table_name": "demo_embeddings"}');
-
Specifying additional options such the schema name, table name, language, format, and table description in
VECTOR_STORE_LOAD
:mysql> CALL sys.VECTOR_STORE_LOAD('file:///var/lib/mysql-files/german_files/de*', '{"schema_name": "demo_db", "table_name": "german_embeddings", "language": "de", "description": "Vector store table containing German PDF files."}');