The ML_GENERATE_TABLE
routine runs multiple
text generation or summarization queries in a batch, in
parallel. The output generated for every input query is the same
as the output generated by the
ML_GENERATE
routine.
To alter an existing table or create a new table, MySQL
requires you to set the
sql-require-primary-key
system variable to 0
.
This routine is available in MySQL 9.0.1-u1 and later versions.
To learn about the privileges you need to run this routine, see Required Privileges.
mysql> call sys.ML_GENERATE_TABLE('InputTableColumn', 'OutputTableColumn', [options]);
options: {
JSON_OBJECT('key','value'[,'key','value'] ...)
'key','value': {
['task', {'generation'|'summarization'}]
['model_id', {'mistral-7b-instruct-v1'|'llama2-7b-v1'|'llama3-8b-instruct-v1'|'cohere.command-r-plus-08-2024'|'cohere.command-r-08-2024'|'meta.llama-3.1-70b-instruct'|
'meta.llama-3.1-405b-instruct'}]
['context_column', 'ContextColumn']
['language', 'Language']
['temperature', Temperature]
['max_tokens', MaxTokens]
['top_k', K]
['top_p', P]
['repeat_penalty', RepeatPenalty]
['frequency_penalty', FrequencyPenalty]
['presence_penalty', PresencePenalty]
['stop_sequences', JSON_ARRAY('StopSequence1'[,'StopSequence2'] ...)]
['batch_size', BatchSize]
}
}
Following are ML_GENERATE_TABLE
parameters:
-
InputTableColumn
: specifies the names of the input database, table, and column that contains the natural-language queries. TheInputTableColumn
is specified in the following format:DBName
.TableName
.ColumnName
.The specified input table can be an internal or external table.
The specified input table must already exist, must not be empty, and must have a primary key.
The input column must already exist and must contain
text
orvarchar
values.The input column must not be a part of the primary key and must not have
NULL
values or empty strings.There must be no backticks used in the
DBName
,TableName
, orColumnName
and there must be no period used in theDBName
orTableName
.
-
OutputTableColumn
: specifies the names of the database, table, and column where the generated text-based response is stored. TheOutputTableColumn
is specified in the following format:DBName
.TableName
.ColumnName
.The specified output table must be an internal table.
If the specified output table already exists, then it must be the same as the input table. And, the specified output column must not already exist in the input table. A new JSON column is added to the table. External tables are read only. So if input table is an external table, then it cannot be used to store the output.
If the specified output table doesn't exist, then a new table is created. The new output table has key columns which contains the same primary key values as the input table and a JSON column that stores the generated text-based responses.
There must be no backticks used in the
DBName
,TableName
, orColumnName
and there must be no period used in theDBName
orTableName
.
-
options
: specifies optional parameters as key-value pairs in JSON format. It can include the following parameters:-
task
: specifies the task expected from the large language model (LLM). Default value isgeneration
. Possible values are:generation
: generates text-based content.summarization
: generates a summary for existing text-based content.
-
model_id
: specifies the LLM to use for the task. Default value ismistral-7b-instruct-v1
. Possible values are:mistral-7b-instruct-v1
llama2-7b-v1
llama3-8b-instruct-v1
The following OCI Generative AI Service LLMs are available in MySQL 9.1.2 and later versions:
cohere.command-r-plus-08-2024
cohere.command-r-08-2024
meta.llama-3.1-70b-instruct
meta.llama-3.1-405b-instruct
To view the lists of supported LLMs, see HeatWave In-Database LLMs and OCI Generative AI Service LLMs.
NoteThe
summarization
task supports HeatWave In-Database LLMs only. context_column
: specifies the table column that contains the context to be used for augmenting the queries and guiding the text generation of the LLM. The specified column must be an existing column in the input table. Default value isNULL
.-
language
: specifies the language to be used for writing queries, ingesting documents, and generating the output. To set the value of thelanguage
parameter, use the two-letterISO 639-1
code for the language.Default value is
en
.For possible values, to view the list of supported languages, see Languages.
-
temperature
: specifies a non-negative float that tunes the degree of randomness in generation. Lower temperatures mean less random generations.Default value is:
0
for the HeatWave In-Database LLMs.0.3
forcohere.command-r-plus-08-2024
andcohere.command-r-08-2024
.0
formeta.llama-3.1-70b-instruct
andmeta.llama-3.1-405b-instruct
.
Possible values are float values between:
0
and5
For the HeatWave In-Database LLMs.0
and1
forcohere.command-r-plus-08-2024
andcohere.command-r-08-2024
.0
and5
formeta.llama-3.1-70b-instruct
andmeta.llama-3.1-405b-instruct
.
It is suggested that:
To generate the same output for a particular prompt every time you run it, set the temperature to
0
.To generate a random new statement for a particular prompt every time you run it, increase the temperature.
-
max_tokens
: specifies the maximum number of tokens to predict per generation using an estimate of three tokens per word. Default value is256
. Possible values are:For
mistral-7b-instruct-v1
, integer values between1
and8000
.For
llama2-7b-v1
, integer values between1
and4096
.For
llama3-8b-instruct-v1
, integer values between1
and4096
.For
cohere.command-r-plus-08-2024
andcohere.command-r-08-2024
, integer values between1
and4000
.For
meta.llama-3.1-70b-instruct
andmeta.llama-3.1-405b-instruct
, integer values between0
and128000
.
top_k
: specifies the number of top most likely tokens to consider for text generation at each step. Default value is40
, which means that top 40 most likely tokens are considered for text generation at each step. Possible values are integer values between0
and32000
.-
top_p
: specifies a number,p
, and ensures that only the most likely tokens with the sum of probabilitiesp
are considered for generation at each step. A higher value ofp
introduces more randomness into the output. Default value is0.95
. Possible values are float values between0
and1
.To disable this method, set to
1.0
or0
.To eliminate tokens with low likelihood, assign
p
a lower value. For example, if set to0.1
, tokens within top 10% probability are included.To include tokens with low likelihood, assign
p
a higher value. For example, if set to0.9
, tokens within top 90% probability are included.
If you are also specifying the
top_k
parameter, the LLM considers only the top tokens whose probabilities add up top
percent. It ignores the rest of thek
tokens. -
repeat_penalty
: assigns a penalty when a token appears repeatedly. High penalties encourage less repeated tokens and produce more random outputs. Default value is1.1
. Possible values are float values between0
and2
.NoteThis parameter is supported for HeatWave In-Database LLMs only.
frequency_penalty
: assigns a penalty when a token appears frequently. High penalties encourage less repeated tokens and produce more random outputs. Default value is0
. Possible values are float values between0
and1
.-
presence_penalty
: assigns a penalty to each token when it appears in the output to encourage generating outputs with tokens that haven't been used. This is similar tofrequency_penalty
, except that this penalty is applied equally to all tokens that have already appeared, irrespective of their exact frequencies.NoteThis parameter is supported for OCI Generative AI Service LLMs only.
Default value is
0
. Possible values are:For
cohere.command-r-plus-08-2024
andcohere.command-r-08-2024
, float values between0
and1
.For
meta.llama-3.1-70b-instruct
andmeta.llama-3.1-405b-instruct
, float values between-2
and2
.
stop_sequences
: specifies a list of characters such as a word, a phrase, a newline, or a period that tells the LLM when to end the generated output. If you have more than one stop sequence, then the LLM stops when it reaches any of those sequences. Default value isNULL
.batch_size
: specifies the batch size for the routine. This parameter is supported for internal tables only. Default value is1000
. Possible values are integer values between1
and1000
.
-
Following are examples of the
ML_GENERATE_TABLE
routine:
To alter an existing table or create a new table, MySQL
requires you to set the
sql-require-primary-key
system variable to 0
.
-
Consider the following input table
demo_db.input_table
:mysql> select * from demo_db.input_table; +----+----------------------------------+ | id | Input | +----+----------------------------------+ | 1 | What is artificial intelligence? | | 2 | What is MySQL? | +----+----------------------------------+
Generate English text-based content in a batch using the
mistral-7b-instruct-v1
model for queries stored indemo_db.input_table
:mysql> call sys.ML_GENERATE_TABLE("demo_db.input_table.Input", "demo_db.output_table.Output", JSON_OBJECT("task", "generation", "model_id", "mistral-7b-instruct-v1", "language", "en"));
The routine takes input queries stored in the
demo_db.input_table.Input
column and creates a new columndemo_db.output_table.Output
where it stores the generated text-based responses.The output table contains the following fields:
mysql> describe demo_db.output_table; +--------+------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +--------+------+------+-----+---------+-------+ | id | int | NO | PRI | NULL | | | Output | json | YES | | NULL | | +--------+------+------+-----+---------+-------+
View the contents of the output table:
mysql> select * from demo_db.output_table; +----+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | id | Output | +----+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | 1 | {"text": " Artificial Intelligence (AI) refers to the simulation of human intelligence in machines that are programmed to think and act like humans. It involves the development of algorithms and computer programs that can perform tasks that typically require human-level intelligence, such as problem solving, decision making, learning, and understanding natural language. AI systems use techniques such as machine learning, deep learning, and neural networks to enable machines to learn from data and improve their performance over time without explicit programming. AI has the potential to revolutionize many industries and change the way we live and work."} | | 2 | {"text": " MySQL is an open-source relational database management system (RDBMS) that is widely used for web, mobile and enterprise applications. It is designed to handle a large amount of data and users, and it offers features such as scalability, security, and flexibility. MySQL is developed and maintained by the MySQL Community, which is made up of thousands of contributors from around the world. It is also supported by Oracle, who acquired the company behind MySQL in 2010."} | +----+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+