HeatWave User Guide  /  ...  /  Summarizing Content

4.3.2 Summarizing Content

This section describes how to summarize exiting content using HeatWave GenAI.

Before You Begin

  • Connect to your HeatWave Database System.

  • For Running Batch Queries, add the natural-language queries to a column in a new or existing table.

Summarizing Content

To summarize text, perform the following steps:

  1. To load the LLM in HeatWave memory, use the ML_MODEL_LOAD routine:

    call sys.ML_MODEL_LOAD('LLM', NULL);

    Replace LLM with the name of the LLM that you want to use.

    For example:

    call sys.ML_MODEL_LOAD('mistral-7b-instruct-v1', NULL);

    This step is optional. The ML_GENERATE routine loads the specified LLM too. But it takes a bit longer to load the LLM and generate the output when you run it for the first time.

  2. To define the text that you want to summarize, set the @text session variable:

    set @text="TextToSummarize";

    Replace TextToSummarize with the text that you want to summarize.

    For example:

    set @text="Artificial Intelligence (AI) is a rapidly growing field that has the potential to
    revolutionize how we live and work. AI refers to the development of computer systems that can
    perform tasks that typically require human intelligence, such as visual perception, speech
    recognition, decision-making, and language translation.\n\nOne of the most significant developments in
    AI in recent years has been the rise of machine learning, a subset of AI that allows computers to learn
    from data without being explicitly programmed. Machine learning algorithms can analyze vast amounts
    of data and identify patterns, making them increasingly accurate at predicting outcomes and making
    decisions.\n\nAI is already being used in a variety of industries, including healthcare, finance, and
    transportation. In healthcare, AI is being used to develop personalized treatment plans for patients
    based on their medical history and genetic makeup. In finance, AI is being used to detect fraud and make
    investment recommendations. In transportation, AI is being used to develop self-driving cars and improve
    traffic flow.\n\nDespite the many benefits of AI, there are also concerns about its potential impact on
    society. Some worry that AI could lead to job displacement, as machines become more capable of performing
    tasks traditionally done by humans. Others worry that AI could be used for malicious ";
  3. To generate the text summary, pass the original text to the LLM using the ML_GENERATE routine, with the task parameter set to summarization:

    select sys.ML_GENERATE(@query, JSON_OBJECT("task", "summarization", "model_id", "LLM", "language", "Language"));

    Replace the following:

    • LLM: LLM to use, which must be the same as the one you loaded in the previous step. To view the lists of supported LLMs, see LLMs.

    • Language: the two-letter ISO 639-1 code for the language you want to use. Default language is en, which is English. To view the list of supported languages, see Languages.

      Note

      The language parameter is supported in HeatWave 9.0.1-u1 and later versions.

    For example:

    select sys.ML_GENERATE(@text, JSON_OBJECT("task", "summarization", "model_id", "mistral-7b-instruct-v1", "language", "en"));

    A text summary generated by the LLM in response to your query is printed as output. It looks similar to the text output shown below:

    | {"text": " Artificial Intelligence (AI) is a rapidly growing field with the potential to revolutionize
    how we live and work. It refers to computer systems that can perform tasks requiring human intelligence, such
    as visual perception, speech recognition, decision-making, and language translation. Machine learning, a
    subset of AI, allows computers to learn from data without being explicitly programmed, making them increasingly
    accurate at predicting outcomes and making decisions. AI is already being used in healthcare, finance, and
    transportation industries for personalized treatment plans, fraud detection, and self-driving cars. However,
    there are concerns about its potential impact on society, including job displacement and malicious use."} |

Running Batch Queries

To run multiple summarization queries in parallel, use the ML_GENERATE_TABLE routine. This method is faster than running the ML_GENERATE routine multiple times.

Note

The ML_GENERATE_TABLE routine is supported in HeatWave 9.0.1-u1 and later versions.

To run batch queries using ML_GENERATE_TABLE, perform the following steps:

  1. To load the LLM in HeatWave memory, use the ML_MODEL_LOAD routine:

    call sys.ML_MODEL_LOAD('LLM', NULL);

    Replace LLM with the name of the LLM that you want to use. To view the lists of supported LLMs, see LLMs.

    For example:

    call sys.ML_MODEL_LOAD('mistral-7b-instruct-v1', NULL);

    This step is optional. The ML_GENERATE_TABLE routine loads the specified LLM too. But it takes a bit longer to load the LLM and generate the output when you run it for the first time.

  2. In the ML_GENERATE_TABLE routine, specify the table columns containing the input queries and for storing the generated text summaries:

    call sys.ML_GENERATE_TABLE("InputDBName.InputTableName.InputColumn", "OutputDBName.OutputTableName.OutputColumn", JSON_OBJECT("task", "summarization", "model_id", "LLM", "language", "Language"));

    Replace the following:

    • InputDBName: the name of the database that contains the table column where your input queries are stored.

    • InputTableName: the name of the table that contains the column where your input queries are stored.

    • InputColumn: the name of the column that contains input queries.

    • OutputDBName: the name of the database that contains the table where you want to store the generated outputs. This can be the same as the input database.

    • OutputTableName: the name of the table where you want to create a new column to store the generated outputs. This can be the same as the input table. If the specified table doesn't exist, a new table is created.

    • OutputColumn: the name for the new column where you want to store the output generated for the input queries.

    • LLM: LLM to use, which must be the same as the LLM you loaded in the previous step.

    • Language: the two-letter ISO 639-1 code for the language you want to use. Default language is en, which is English. To view the list of supported languages, see Languages.

    For example:

    call sys.ML_GENERATE_TABLE("demo_db.input_table.Input", "demo_db.output_table.Output", JSON_OBJECT("task", "summarization", "model_id", "mistral-7b-instruct-v1", "language", "en"));

    To learn more about the available routine options, see ML_GENERATE_TABLE Syntax.