As of MySQL 9.4.1, MySQL HeatWave GenAI lets you generate text summaries
for unstructured files available in Object Storage. You need to
first ingest these documents into vector store tables using
Auto Parallel Load. The
generated table contains text segments that are used as input
text for the
ML_GENERATE
and
ML_GENERATE_TABLE
routines to generate summaries of the content available in the
ingested files.
This topic contains the following sections:
-
Review the requirements to set up a vector store.
To Generate Summary for a Single File, create an Object Storage bucket with the name
demo_bucket
. Download the MySQL HeatWave technical brief PDF, then upload it todemo_bucket
. To Generate Summaries for Multiple Files, also download the MySQL HeatWave on AWS brief PDF and upload it todemo_bucket
. -
Complete the steps to Ingest Files Using Auto Parallel Load with the
split_by
option set todocument
for creating one text segment per document in the resulting vector store table.If a file contains more than 1 million characters, all content beyond this limit is truncated. Therefore, in such cases, it is recommended to use alternative
split_by
option values such aspage
orparagraph
. You can later concatenate the resulting text segments to reconstruct the full document.
Perform the following steps:
-
Copy the text segment of the file from the vector store table to the
@document
variable:mysql>SELECT segment INTO @document FROM DBName.VectorStoreTableName WHERE document_id=0;
Replace the following:
DBName
: the name of the database.VectorStoreTableName
: the name of the vector store table.
For example:
mysql>SELECT segment INTO @document FROM demo_db.demo_embeddings_apl WHERE document_id=0;
-
Generate a summary of the text segments using the
ML_GENERATE
routine:mysql>CALL sys.ML_GENERATE(@document, JSON_OBJECT("task", "summarization", "model_id", "ModelID", "max_tokens", MaxTokens));
Replace the following:
ModelID
with the LLM to use.ModelID
: the LLM to use.MaxTokens
: specify the maximum token value depending on how long you want the summary.
For example:
mysql>SELECT sys.ML_GENERATE(@document, JSON_OBJECT("task", "summarization", "model_id", "mistral-7b-instruct-v3", "max_tokens", 400));
It take a few minutes for this call to complete. When completed, a text summary generated by the LLM is printed as output. It looks similar to the text output shown below:
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | sys.ML_GENERATE(@document, JSON_OBJECT("task", "summarization", "model_id", "mistral-7b-instruct-v3", "max_tokens", 400)) | +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | {"text": " MySQL HeatWave is a fully managed cloud service that provides integrated, automated, and secure generative AI and machine learning (ML) for transactions and lakehouse scale analytics. It offers unmatched performance and price-performance without the complexity, latency, risks, and cost of ETL duplication. MySQL HeatWave can be deployed in multiple clouds including OCI, AWS, Azure, and hybrid environments. The service is designed to overcome the limitations of traditional data warehouse, analytics, lakehouse, machine learning, Generative AI, and vector store environments that use periodic long-running ETL batch jobs to refresh the data.\n\nThe text provides an overview of MySQL HeatWave, a cloud service offered by Oracle. The service offers integrated, automated, and secure generative AI and machine learning for transactions and lakehouse scale analytics. It is designed to overcome the limitations of traditional data warehouse, analytics, lakehouse, machine learning, Generative AI, and vector store environments that use periodic long-running ETL batch jobs to refresh the data. MySQL HeatWave can be deployed in multiple clouds including OCI, AWS, Azure, and hybrid environments. The service offers unmatched performance and price-performance without the complexity, latency, risks, and cost of ETL duplication. The document also provides a table of contents for a more detailed technical brief on MySQL HeatWave."} | +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Perform the following steps:
-
Using the database name, vector store table name, and document IDs, copy the text segments of the files from the vector store table to different variables:
mysql>SELECT segment INTO @document1 FROM DBName.VectorStoreTableName WHERE document_id=DocumentID; mysql>SELECT segment INTO @document2 FROM DBName.VectorStoreTableName WHERE document_id=DocumentID; ...
These segments can be from the same or different vector store tables.
For example:
mysql>SELECT segment INTO @document1 FROM demo_db.demo_embeddings_apl WHERE document_id=0; mysql>SELECT segment INTO @document2 FROM demo_db.demo_embeddings_apl WHERE document_id=1;
-
Add these text segments to a table:
mysql>CREATE TABLE input_table (id INT AUTO_INCREMENT, Input TEXT, primary key (id)); mysql>INSERT INTO input_table (Input) VALUES(@document1); mysql>INSERT INTO input_table (Input) VALUES(@document2);
-
Generate summaries of the text segments using the
ML_GENERATE_TABLE
routine:mysql>CALL sys.ML_GENERATE_TABLE("InputDBName.InputTableName.InputColumn", "OutputDBName.OutputTableName.OutputColumn", JSON_OBJECT("task", "summarization", "model_id", "ModelID", "language", "Language"));
Replace the following:
InputDBName
: the name of the database that contains the table column where your input queries are stored.InputTableName
: the name of the table that contains the column where your input queries are stored.InputColumn
: the name of the column that contains input queries.OutputDBName
: the name of the database that contains the table where you want to store the generated outputs. This can be the same as the input database.OutputTableName
: the name of the table where you want to create a new column to store the generated outputs. This can be the same as the input table. If the specified table doesn't exist, a new table is created.OutputColumn
: the name for the new column where you want to store the output generated for the input queries.ModelID
: LLM to use.Language
: the two-letterISO 639-1
code for the language you want to use. Default language isen
, which is English. To view the list of supported languages, see Languages.
For example:
mysql>CALL sys.ML_GENERATE_TABLE("demo_db.input_table.Input", "demo_db.output_table.Output", JSON_OBJECT("task", "summarization", "model_id", "mistral-7b-instruct-v3", "language", "en"));
It take a few minutes for this call to complete.
-
View the contents of the output table:
mysql>SELECT * FROM output_table\G *************************** 1. row *************************** id: 1 Output: {"text": "\nThe text is a business and technical brief for MySQL HeatWave on AWS, a fully managed cloud service that provides integrated, automated, and secure generative AI and machine learning in one service for transactions and lakehouse scale analytics. The service is optimized for Amazon Web Services (AWS) and offers unmatched performance and price-performance. It includes features such as MySQL HeatWave Lakehouse, MySQL HeatWave AutoML, MySQL HeatWave GenAI, and MySQL HeatWave Autopilot. The document also provides information on the architecture of MySQL HeatWave on AWS, its integration with AWS services, and its performance advantages compared to other services such as Amazon Redshift, Snowflake, Google BigQuery, and Azure Synapse. The document concludes by inviting readers to try MySQL HeatWave on AWS for free.", "error": null} *************************** 2. row *************************** id: 2 Output: {"text": " MySQL HeatWave is a fully managed cloud service that provides integrated, automated, and secure generative AI and machine learning (ML) for transactions and lakehouse scale analytics. It offers unmatched performance and price-performance without the complexity, latency, risks, and cost of ETL duplication. MySQL HeatWave can be deployed in multiple clouds including OCI, AWS, Azure, and hybrid environments. The service is designed to overcome the limitations of traditional data warehouse, analytics, lakehouse, machine learning, Generative AI, and vector store environments that use periodic long-running ETL batch jobs to refresh the data.\n\nThe text provides an overview of MySQL HeatWave, a cloud service offered by Oracle. The service offers integrated, automated, and secure generative AI and machine learning for transactions and lakehouse scale analytics. It is designed to overcome the limitations of traditional data warehouse, analytics, lakehouse, machine learning, Generative AI, and vector store environments that use periodic long-running ETL batch jobs to refresh the data. MySQL HeatWave can be deployed in multiple clouds including OCI, AWS, Azure, and hybrid environments. The service offers unmatched performance and price-performance without the complexity, lat"}
Learn more about Setting Up a Vector Store.