MySQL AI  /  ...  /  Ingesting Files into a Vector Store

5.6.2 Ingesting Files into a Vector Store

This section describes how to generate vector embeddings for files or folders stored in , and load the embeddings into a vector store table.

The following sections in this topic describe how to ingest files into a vector store:

Before You Begin

  • Review the GenAI requirements and privileges.

  • Place the files that you want to load in the vector store directory that you specified in the MySQL AI installer.

    Vector store can ingest files in the following formats: PDF, PPTX, PPT, TXT, HTML, DOCX, and DOC.

    To test the steps in this topic, create a folder demo-directory inside the vector store director /var/lib/mysql-files for storing files that you want to ingest into the vector store. Then, download and place the MySQL HeatWave user guide PDF in the demo-directory folder.

  • To create and store vector store tables using the steps described in this topic, you can create a new database demo_db:

    CREATE DATABASE demo_db;

Ingesting Files into a Vector Store

The VECTOR_STORE_LOAD routine creates and loads vector embeddings into the vector store. You can ingest the source files into the vector store using the following methods:

Perform the following steps:

  1. To create the vector store table, use a new or existing database:

    mysql> USE DBName;

    Replace DBName with the database name.

    For example:

    mysql> USE demo_db;
  2. Optionally, to specify a name for the vector store table and language to use, set the @options variable:

    mysql> SET @options = JSON_OBJECT("table_name", "VectorStoreTableName", "language", "Language");

    Replace the following:

    • VectorStoreTableName: the name you want for the vector store table.

    • Language: the two-letter ISO 639-1 code for the language you want to use. Default language is en, which is English. To view the list of supported languages, see Languages.

    For example:

    mysql> SET @options = JSON_OBJECT("table_name", "demo_embeddings", "language", "en");

    To learn more about the available routine options, see VECTOR_STORE_LOAD Syntax.

  3. To import a file from the local filesystem and create a vector store table, use the VECTOR_STORE_LOAD routine:

    mysql> CALL sys.VECTOR_STORE_LOAD("file://FilePath", @options);

    Replace FilePath with the unique reference index (URI) of the files or directories to be ingested into the vector store. A URI is considered to be one of the following:

    • A glob pattern, if it contains at least one unescaped ? or * character.

    • A prefix, if it is not a pattern and ends with a / character like a folder path.

    • A file path, if it is neither a glob pattern nor a prefix.

    Note

    Ensure that the documents to be loaded are present in the directory that you specified for loading documents into the vector store during MySQL AI installation or using the secure_file_priv server system variable.

    For example:

    mysql> CALL sys.VECTOR_STORE_LOAD("file:///var/lib/mysql-files/demo-directory/heatwave-en.pdf", @options);

    This loads the specified file or files from the specified directory into the vector store table.

  4. After the task is completed, verify that embeddings are loaded in the vector store table:

    mysql> SELECT COUNT(*) FROM VectorStoreTableName;

    For example:

    mysql> SELECT COUNT(*) FROM demo_embeddings;

    If you see a numerical value in the output, your embeddings are successfully loaded in the vector store table.

  5. To view the details of the vector store table, use the following statement:

    mysql> DESCRIBE demo_embeddings;
    +-------------------+---------------+------+-----+---------+-------+
    | Field             | Type          | Null | Key | Default | Extra |
    +-------------------+---------------+------+-----+---------+-------+
    | document_name     | varchar(1024) | NO   |     | NULL    |       |
    | metadata          | json          | NO   |     | NULL    |       |
    | document_id       | int unsigned  | NO   | PRI | NULL    |       |
    | segment_number    | int unsigned  | NO   | PRI | NULL    |       |
    | segment           | varchar(1024) | NO   |     | NULL    |       |
    | segment_embedding | vector(384)   | NO   |     | NULL    |       |
    +-------------------+---------------+------+-----+---------+-------+

Cleaning Up

If you created a new database for testing the steps in this topic, delete the database to free up space:

mysql> DROP DATABASE demo_db;