HeatWave User Guide  /  ...  /  Running Retrieval-Augmented Generation

4.4.4 Running Retrieval-Augmented Generation

HeatWave retrieves content from the vector store and provide that as context to the LLM. This process is called as retrieval-augmented generation or RAG. This helps the LLM to produce more relevant and accurate results for your queries. The ML_MODEL_LOAD routine loads the LLM, and the ML_RAG routine runs RAG to generate accurate responses for your queries.

If the vector store tables contain information in different languages, then the ML_RAG routine filters the retrieved context using the embedding model name and the language used for ingesting files into the vector store table. Both these details are stored as metadata in the vector store tables.

Before You Begin

Retrieving Context and Generating Relevant Content

To enter a natural-language query, retrieve the context, and generate accurate results using RAG, perform the following steps:

  1. To load the LLM in HeatWave memory, use the ML_MODEL_LOAD routine:

    call sys.ML_MODEL_LOAD("LLM", NULL);

    Replace LLM with the name of the LLM that you want to use. To view the lists of supported LLMs, see HeatWave In-Database LLMs and OCI Generative AI Service LLMs.

    For example:

    call sys.ML_MODEL_LOAD("mistral-7b-instruct-v1", NULL);

    This step is optional. The ML_RAG routine loads the specified LLM too. But it takes a bit longer to load the LLM and generate the output when you run it for the first time.

  2. To specify the table for retrieving the vector embeddings to use as context, set the @options session variable:

    set @options = JSON_OBJECT("vector_store", JSON_ARRAY("DBName.VectorStoreTableName"), "model_options", JSON_OBJECT("language", "Language"));

    Replace the following:

    • DBName: the name of the database that contains the vector store table.

    • VectorStoreTableName: the name of the vector store table.

    • Language: the two-letter ISO 639-1 code for the language you want to use. Default language is en, which is English. To view the list of supported languages, see Languages.

      Note

      The language parameter is supported in HeatWave 9.0.1-u1 and later versions.

    For example:

    set @options = JSON_OBJECT("vector_store", JSON_ARRAY("demo_db.demo_embeddings"), "model_options", JSON_OBJECT("language", "en"));

    To learn more about the available routine options, see ML_RAG Syntax.

  3. To define your natural-language query, set the session @query variable:

    set @query="AddYourQuery";

    Replace AddYourQuery with your natural-language query.

    For example:

    set @query="What is AutoML?";
  4. To retrieve the augmented prompt, use the ML_RAG routine:

    call sys.ML_RAG(@query,@output,@options);
  5. Print the output:

    select JSON_PRETTY(@output);

    Text-based content that is generated by the LLM in response to your query is printed as output. The output generated by RAG is comprised of two parts:

    • The text section contains the text-based content generated by the LLM as a response for your query.

    • The citations section shows the segments and documents it referred to as context.

    The output looks similar to the following:

    "text": " AutoML is a machine learning technique that automates the process
    of selecting, training, and evaluating machine learning models. It involves
    using algorithms and techniques to automatically identify the best model
    for a given dataset and optimize its hyperparameters without requiring manual
    intervention from data analysts or ML practitioners. AutoML can be used in
    various stages of the machine learning pipeline, including data preprocessing,
    feature engineering, model selection, hyperparameter tuning,
    and model evaluation.",
    "citations": [
      {
        "segment": "Oracle AutoML also produces high quality models very efficiently,
        which is achieved through a scalable design and intelligent choices that
        reduce trials at each stage in the pipeline.\n Scalable design: The Oracle
        AutoML pipeline is able to exploit both HeatWave internode and intranode
        parallelism, which improves scalability and reduces runtime.",
        "distance": 0.4262576103210449,
        "document_name": "https://objectstorage.Region.oraclecloud.com/n/Namespace/b/BucketName/o/Path/Filename"
      },
      {
        "segment": "The HeatWave AutoML ML_TRAIN routine leverages Oracle AutoML
        technology to automate the process of training a machine learning model.
        Oracle AutoML replaces the laborious and time consuming tasks of the data
        analyst whose workflow is as follows:\n1. Selecting a model from a large
        number of viable candidate models.\n2.\n99",
        "distance": 0.4311879277229309,
        "document_name": " https://objectstorage. Region.oraclecloud.com/n/Namespace/b/BucketName/o/Path/Filename"
      },
      {
        "segment": "3.1 HeatWave AutoML Features HeatWave AutoML makes it easy to
        use machine learning, whether you are a novice user or an experienced ML
        practitioner. You provide the data, and HeatWave AutoML analyzes the
        characteristics of the data and creates an optimized machine learning model
        that you can use to generate predictions and explanations.",
        "distance": 0.4441382884979248,
        "document_name": "https://objectstorage. Region.oraclecloud.com/n/Namespace/b/BucketName/o/Path/Filename"
      }
    ],
    "vector_store": [
      "demo_db.demo_embeddings"
    ]

    To continue running more queries in the same session, repeat steps 3 to 5.

Retrieving Context Without Generating Content

To enter a natural-language query and retrieve the context without generating a response for the query, perform the following steps:

  1. To load the LLM in HeatWave memory, use the ML_MODEL_LOAD routine:

    call sys.ML_MODEL_LOAD("LLM", NULL);

    Replace LLM with the name of the LLM that you want to use. To view the lists of supported LLMs, see HeatWave In-Database LLMs and OCI Generative AI Service LLMs.

    For example:

    call sys.ML_MODEL_LOAD("mistral-7b-instruct-v1", NULL);

    This step is optional. The ML_RAG routine loads the specified LLM too. But it takes a bit longer to load the LLM and generate the output when you run it for the first time.

  2. To specify the table for retrieving the vector embeddings and to skip generation of content, set the @options session variable:

    set @options = JSON_OBJECT("vector_store", JSON_ARRAY("DBName.VectorStoreTableName"), "skip_generate", true, "model_options", JSON_OBJECT("language", "Language"));

    Replace the following:

    • DBName: the name of the database that contains the vector store table.

    • VectorStoreTableName: the name of the vector store table.

    • Language: the two-letter ISO 639-1 code for the language you want to use. Default language is en, which is English. To view the list of supported languages, see Languages.

      Note

      The language parameter is supported in HeatWave 9.0.1-u1 and later versions.

    For example:

    set @options = JSON_OBJECT("vector_store", JSON_ARRAY("demo_db.demo_embeddings"), "skip_generate", true, "model_options", JSON_OBJECT("language", "en"));
  3. To define your natural-language query, set the session @query variable:

    set @query="AddYourQuery";

    Replace AddYourQuery with your natural-language query.

    For example:

    set @query="What is AutoML?";
  4. To retrieve the augmented prompt, use the ML_RAG routine:

    call sys.ML_RAG(@query,@output,@options);
  5. Print the output:

    select JSON_PRETTY(@output);

    Semantically similar text segments used as content for the query and the name of the documents they were found in are printed as output.

    The output looks similar to the following:

    {
        "citations": [
        {
          "segment": "Oracle AutoML also produces high quality models very efficiently,
          which is achieved through a scalable design and intelligent choices that
          reduce trials at each stage in the pipeline.\n Scalable design: The Oracle
          AutoML pipeline is able to exploit both HeatWave internode and intranode
          parallelism, which improves scalability and reduces runtime.",
          "distance": 0.4262576103210449,
          "document_name": "https://objectstorage.Region.oraclecloud.com/n/Namespace/b/BucketName/o/Path/Filename"
        },
        {
          "segment": "The HeatWave AutoML ML_TRAIN routine leverages Oracle AutoML
          technology to automate the process of training a machine learning model.
          Oracle AutoML replaces the laborious and time consuming tasks of the data
          analyst whose workflow is as follows:\n1. Selecting a model from a large
          number of viable candidate models.\n2.\n99",
          "distance": 0.4311879277229309,
          "document_name": " https://objectstorage. Region.oraclecloud.com/n/Namespace/b/BucketName/o/Path/Filename"
        },
        {
          "segment": "3.1 HeatWave AutoML Features HeatWave AutoML makes it easy to
          use machine learning, whether you are a novice user or an experienced ML
          practitioner. You provide the data, and HeatWave AutoML analyzes the
          characteristics of the data and creates an optimized machine learning model
          that you can use to generate predictions and explanations.",
          "distance": 0.4441382884979248,
          "document_name": "https://objectstorage. Region.oraclecloud.com/n/Namespace/b/BucketName/o/Path/Filename"
        }
      ],
      "vector_store": [
        "demo_db.demo_embeddings"
      ]
    }

    To continue running more queries in the same session, repeat steps 3 to 5.

Running Batch Queries

To run multiple RAG queries in parallel, use the ML_RAG_TABLE routine. This method is faster than running the ML_RAG routine multiple times.

Note

The ML_RAG_TABLE routine is supported in HeatWave 9.0.1-u1 and later versions.

To run batch queries using ML_RAG_TABLE, perform the following steps:

  1. To load the LLM in HeatWave memory, use the ML_MODEL_LOAD routine:

    call sys.ML_MODEL_LOAD("LLM", NULL);

    Replace LLM with the name of the LLM that you want to use. To view the lists of supported LLMs, see HeatWave In-Database LLMs and OCI Generative AI Service LLMs.

    For example:

    call sys.ML_MODEL_LOAD("mistral-7b-instruct-v1", NULL);

    This step is optional. The ML_RAG_TABLE routine loads the specified LLM too. But it takes a bit longer to load the LLM and generate the output when you run it for the first time.

  2. To specify the table for retrieving the vector embeddings to use as context, set the @options session variable:

    set @options = JSON_OBJECT("vector_store", JSON_ARRAY("DBName.VectorStoreTableName"), "model_options", JSON_OBJECT("language", "Language"));

    Replace the following:

    • DBName: the name of the database that contains the vector store table.

    • VectorStoreTableName: the name of the vector store table.

    • Language: the two-letter ISO 639-1 code for the language you want to use. Default language is en, which is English. To view the list of supported languages, see Languages.

    For example:

    set @options = JSON_OBJECT("vector_store", JSON_ARRAY("demo_db.demo_embeddings"), "model_options", JSON_OBJECT("language", "en"));

    To learn more about the available routine options, see ML_RAG_TABLE Syntax.

  3. In the ML_RAG_TABLE routine, specify the table columns containing the input queries and for storing the generated outputs:

    call sys.ML_RAG_TABLE("InputDBName.InputTableName.InputColumn", "OutputDBName.OutputTableName.OutputColumn", @options);

    Replace the following:

    • InputDBName: the name of the database that contains the table column where your input queries are stored.

    • InputTableName: the name of the table that contains the column where your input queries are stored.

    • InputColumn: the name of the column that contains input queries.

    • OutputDBName: the name of the database that contains the table where you want to store the generated outputs. This can be the same as the input database.

    • OutputTableName: the name of the table where you want to create a new column to store the generated outputs. This can be the same as the input table. If the specified table doesn't exist, a new table is created.

    • OutputColumn: the name for the new column where you want to store the output generated for the input queries.

    For example:

    call sys.ML_RAG_TABLE("demo_db.input_table.Input", "demo_db.output_table.Output", @options);