MySQL AI 9.6  /  ...  /  NL2ML

8.1.14 NL2ML

The NL2ML (natural language to machine learning) routine allows you to ask questions and receive relevant citations from MySQL AI documentation.

Note

To use this feature, you must load the appropriate version of MySQL AI documentation to the folder defined by secure_file_priv. See Load MySQL AI Documentation.

NL2ML Syntax

mysql> CALL sys.NL2ML (query, response);

NL2ML parameters:

  • query: Enter a question in natural language related to MySQL HeatWave AutoML. For example, "What are the different types of machine learning models I can create?".

  • response: The name of the JSON object session variable that contains the response to the question.

The nl2ml_options Session Variable

To use the NL2ML routine, you must set the skip_generate option to true. The default value is false. Review the following syntax example and see Use NL2ML with In-Database LLMs.

Syntax Example

After generating citations with NL2ML, retrieve a relevant table schema related to the question (ML_RETRIEVE_SCHEMA_METADATA), build a compact context string from the citations (GROUP_CONCAT), and generate a response that includes the citations, context, and retrieved table schema (ML_GENERATE).

mysql> SET @input = "How can I train a model to predict net worth of a singer?";

mysql> SET @nl2ml_options = JSON_OBJECT("skip_generate", true);

mysql> CALL sys.NL2ML(@input, @out);

mysql> SELECT JSON_PRETTY(@out);
JSON_PRETTY(@out)
{
  "citations": [
    {
      "segment": "<segment content>",
      "distance": 0.1023,
      "document_name": <mysql_ai_en.pdf>,
      "segment_number": <segment number>
    },
    ...
  ],
  "retrieval_info": {
    "method": "n_citations",
    "threshold": 0.114
  }
}
mysql> CALL sys.ML_RETRIEVE_SCHEMA_METADATA(@input, @retrieved, NULL);
  
mysql> SELECT @retrieved;
@retrieved
CREATE TABLE `mlcorpus`.`singer`(
  `Singer_ID` int,
  `Name` varchar,
  `Birth_Year` double,
  `Net_Worth_Millions` double,
  `Citizenship` varchar
);

mysql> SELECT GROUP_CONCAT(seg SEPARATOR '\n') INTO @ctx
FROM JSON_TABLE(JSON_EXTRACT(@out,'$.citations'),
'$[*]' COLUMNS (seg LONGTEXT PATH '$.segment')) AS jt;
  
mysql> SET @final_ctx = CONCAT(@ctx, '\n\nRetrieved tables:\n', @retrieved);
  
mysql> SELECT sys.ML_GENERATE(
@input,
JSON_OBJECT(
"task", "generation",
"model_id", "llama3.2-3b-instruct-v1",
"context", @final_ctx
)
) INTO @result;

mysql> SELECT JSON_UNQUOTE(JSON_EXTRACT(@result,'$.text')) AS generated_sql;
generated_sql
To train a model to predict the net worth of a singer, you can use the ML_TRAIN routine. First, prepare your dataset, 
which in this case seems to be the 'singer' table in the 'mlcorpus' schema. Ensure that the table has the necessary columns, 
such as 'Singer_ID', 'Name', 'Birth_Year', 'Net_Worth_Millions', and 'Citizenship'. 

The 'Net_Worth_Millions' column will be your target column for prediction. You may need to preprocess your data, 
for example, converting categorical variables like 'Name' and 'Citizenship' into numerical variables if necessary.

Then, you can call the ML_TRAIN routine with the appropriate options. For a regression task like predicting net worth, 
you would specify the task as 'regression' in the JSON options. Here's a simplified example:

```sql
CALL sys.ML_TRAIN('mlcorpus.singer', 
                  @model_handle, 
                  'Net_Worth_Millions', 
                  JSON_OBJECT('task', 'regression', 
                              'algorithm', 'XGBRegressor'));
```

Replace '@model_handle' with your actual model handle variable. This will train a model to predict the 'Net_Worth_Millions' 
based on the other columns in your 'singer' table. After training, you can use the ML_PREDICT_ROW or ML_PREDICT_TABLE routine 
to generate predictions for new, unseen data.