MySQL AI  /  ...  /  VECTOR_STORE_LOAD

7.2.3 VECTOR_STORE_LOAD

The VECTOR_STORE_LOAD routine generates vector embedding for the specified files or folders that are , and loads the embeddings into a new vector store table.

This topic contains the following sections:

To learn about the privileges you need to run this routine, see Section 5.3, “Required Privileges for using GenAI”.

VECTOR_STORE_LOAD Syntax

mysql> CALL sys.VECTOR_STORE_LOAD('URI'[, options]);

options: JSON_OBJECT(keyvalue[, keyvalue]...)
keyvalue: 
{
  'format', 'Format' 
  |'schema_name', 'SchemaName'
  |'table_name', 'TableName'
  |'language', 'Language'
  |'embed_model_id', 'ModelID'
  |'description', 'Description'
  |'ocr', {true|false}
}

Following are VECTOR_STORE_LOAD parameters:

  • URI: specifies the unique reference index (URI) of the files or folders to be ingested into the vector store.

    A URI is considered to be one of the following:

    • A glob pattern, if it contains at least one unescaped ? or * character.

    • A prefix, if it is not a pattern and ends with a / character like a folder path.

    • A file path, if it is neither a glob pattern nor a prefix.

  • options: specifies optional parameters as key-value pairs in JSON format. It can include the following parameters:

    • format: specifies the format of files to be loaded. Default value is auto_unstructured, which means all supported types of files are loaded. Possible values are pdf, pptx, ppt, txt, html, docx, doc, and auto_unstructured.

    • schema_name: specifies the name of the schema where the vector embeddings are to be loaded. By default, this procedure uses the current schema from the session.

    • table_name: specifies the name of the vector store table to create. By default, the routine generates a unique table name with format vector_store_data_x, where x is a counter.

    • language: specifies the text content language used in the files to be ingested into the vector store. To set the value of the language parameter, use the two-letter ISO 639-1 code for the language.

      Default value is en.

      For possible values, to view the list of supported languages, see Section 5.4, “Supported LLM, Embedding Model, and Languages”.

    • embed_model_id: specifies the embedding model to use for encoding the text. Default value is multilingual-e5-small.

      For possible values, to view the list of available embedding models, see In-Database Embedding Model.

    • description: specifies a description of document collection being loaded. Default value is NULL.

    • ocr: specifies whether to enable or disable Optical Character Recognition (OCR). If set to false, disables OCR. Default value is true, which means OCR is enabled by default. Default value is true.

Syntax Examples

  • Specifying the file to ingest, using the current database, auto-generated name for the vector store table, and default values for all options:

    mysql> CALL sys.VECTOR_STORE_LOAD('file:///var/lib/mysql-files/demo-directory/heatwave-en.pdf', NULL);
  • Specifying the file to ingest, using the current database, and specifying the name of the vector store table to be created:

    mysql> CALL sys.VECTOR_STORE_LOAD('file:///var/lib/mysql-files/demo-directory/heatwave-en.pdf', '{"table_name": "demo_embeddings"}');
  • Specifying additional options such the schema name, table name, language, format, and table description in VECTOR_STORE_LOAD:

    mysql> CALL sys.VECTOR_STORE_LOAD('file:///var/lib/mysql-files/german_files/de*', '{"schema_name": "demo_db", "table_name": "german_embeddings", "language": "de", "description": "Vector store table containing German PDF files."}');