The NL2ML (natural language to machine
learning) routine allows you to ask questions and receive
relevant citations from MySQL AI documentation.
To use this feature, you must load the appropriate version of
MySQL AI documentation to the folder defined by
secure_file_priv. See
Load MySQL AI Documentation.
mysql> CALL sys.NL2ML (query, response);
NL2ML parameters:
query: Enter a question in natural language related to MySQL HeatWave AutoML. For example, "What are the different types of machine learning models I can create?".response: The name of the JSON object session variable that contains the response to the question.
To use the NL2ML routine, you must set the
skip_generate option to
true. The default value is
false. Review the following syntax example
and see Use NL2ML with In-Database LLMs.
After generating citations with NL2ML,
retrieve a relevant table schema related to the question
(ML_RETRIEVE_SCHEMA_METADATA), build a
compact context string from the citations
(GROUP_CONCAT), and generate a response
that includes the citations, context, and retrieved table
schema (ML_GENERATE).
mysql> SET @input = "How can I train a model to predict net worth of a singer?";
mysql> SET @nl2ml_options = JSON_OBJECT("skip_generate", true);
mysql> CALL sys.NL2ML(@input, @out);
mysql> SELECT JSON_PRETTY(@out);
JSON_PRETTY(@out)
{
"citations": [
{
"segment": "<segment content>",
"distance": 0.1023,
"document_name": <mysql_ai_en.pdf>,
"segment_number": <segment number>
},
...
],
"retrieval_info": {
"method": "n_citations",
"threshold": 0.114
}
}
mysql> CALL sys.ML_RETRIEVE_SCHEMA_METADATA(@input, @retrieved, NULL);
mysql> SELECT @retrieved;
@retrieved
CREATE TABLE `mlcorpus`.`singer`(
`Singer_ID` int,
`Name` varchar,
`Birth_Year` double,
`Net_Worth_Millions` double,
`Citizenship` varchar
);
mysql> SELECT GROUP_CONCAT(seg SEPARATOR '\n') INTO @ctx
FROM JSON_TABLE(JSON_EXTRACT(@out,'$.citations'),
'$[*]' COLUMNS (seg LONGTEXT PATH '$.segment')) AS jt;
mysql> SET @final_ctx = CONCAT(@ctx, '\n\nRetrieved tables:\n', @retrieved);
mysql> SELECT sys.ML_GENERATE(
@input,
JSON_OBJECT(
"task", "generation",
"model_id", "llama3.2-3b-instruct-v1",
"context", @final_ctx
)
) INTO @result;
mysql> SELECT JSON_UNQUOTE(JSON_EXTRACT(@result,'$.text')) AS generated_sql;
generated_sql
To train a model to predict the net worth of a singer, you can use the ML_TRAIN routine. First, prepare your dataset,
which in this case seems to be the 'singer' table in the 'mlcorpus' schema. Ensure that the table has the necessary columns,
such as 'Singer_ID', 'Name', 'Birth_Year', 'Net_Worth_Millions', and 'Citizenship'.
The 'Net_Worth_Millions' column will be your target column for prediction. You may need to preprocess your data,
for example, converting categorical variables like 'Name' and 'Citizenship' into numerical variables if necessary.
Then, you can call the ML_TRAIN routine with the appropriate options. For a regression task like predicting net worth,
you would specify the task as 'regression' in the JSON options. Here's a simplified example:
```sql
CALL sys.ML_TRAIN('mlcorpus.singer',
@model_handle,
'Net_Worth_Millions',
JSON_OBJECT('task', 'regression',
'algorithm', 'XGBRegressor'));
```
Replace '@model_handle' with your actual model handle variable. This will train a model to predict the 'Net_Worth_Millions'
based on the other columns in your 'singer' table. After training, you can use the ML_PREDICT_ROW or ML_PREDICT_TABLE routine
to generate predictions for new, unseen data.