The
NL2ML
routine enables you to learn about MySQL AI AutoML by providing
relevant citations to MySQL AI documentation. You can also leverage
the
ML_GENERATE
routine or an MCP server with external LLMs to generate AutoML
queries you can copy and run.
This topic has the following sections.
To use this feature, you must load the appropriate version of MySQL AI documentation to the folder defined by
secure_file_priv. See Load MySQL AI Documentation.Review the required tasks to Create a Machine Learning Model.
Before using this feature, you must load the appropriate version
of the MySQL AI documentation into the MySQL AI directory defined by
secure_file_priv.
To load the documentation:
In the top-right corner of this page, make sure that the correct verion of MySQL AI is selected.
In the bottom-left corner of this page, click the link to download the PDF version of the documentation.
Rename the downloaded PDF file to
mysql_ai_en.pdf.-
Log into your MySQL AI instance and upload the PDF file to the MySQL AI directory defined by
secure_file_priv. Ensure that the file has the appropriate read access for all users, so that MySQL AI can read the file.If you do not know the appropriate directory, you can run the following command:
mysql> SELECT @@secure_file_privSee LOAD DATA Statement.
To use
NL2ML
to provide citations to MySQL HeatWave documentation, and then leverage
in-database LLMs to generate responses that include appropriate
table schemas and commands, do the following:
Set the
skip_generateoption totruewith the@nl2ml_optionssession variable.Use
ML_RETRIEVE_SCHEMA_METADATAto retrieve the table schema related to the question asked during theNL2MLroutine.Use
GROUP_CONCAT()to build a compact context string from the citations provided byNL2ML.Use
ML_GENERATEto specify the in-database LLM and generate the response to the question, which includes the citations, context, and retrieved table schema.
See the following example.
To use in-database LLMs with
NL2ML:
-
Specify the question and set it into a variable (
@input).mysql> SET @input = "How can I train a model to predict net worth of a singer?"; -
Set the
skip_generateoption totruewith the@nl2ml_optionssession variable.mysql> SET @nl2ml_options = JSON_OBJECT("skip_generate", true); -
Run the
NL2MLroutine and include the previous variable that has the question.mysql> CALL sys.NL2ML(@input, @out); -
View the output generated from the question by selecting the
@outvariable.mysql> SELECT JSON_PRETTY(@out); JSON_PRETTY(@out) { "citations": [ { "segment": "<segment content>", "distance": 0.1023, "document_name": <mysql_ai_en.pdf>, "segment_number": <segment number> }, ... ], "retrieval_info": { "method": "n_citations", "threshold": 0.114 } }The output includes citations with the following information:
segment: The relevant excerpts from the MySQL AI documentation.distance: A value indicating how relevant the segment is to the question asked. A lower value represents a more relevant segment.document_name: A reference to the MySQL AI documentation.segment_number: The index number identifying the segment.
-
Use
ML_RETRIEVE_SCHEMA_METADATAto retrieve the most relevant table schema for the previous question.mysql> CALL sys.ML_RETRIEVE_SCHEMA_METADATA(@input, @retrieved, NULL); -
Retrieve the table schema from the
@retrievedvariable.mysql> SELECT @retrieved; @retrieved CREATE TABLE `mlcorpus`.`singer`( `Singer_ID` int, `Name` varchar, `Birth_Year` double, `Net_Worth_Millions` double, `Citizenship` varchar ); -
Use
GROUP_CONCAT()to build a compact context string from the citations provided byNL2ML.mysql> SELECT GROUP_CONCAT(seg SEPARATOR '\n') INTO @ctx FROM JSON_TABLE(JSON_EXTRACT(@out,'$.citations'), '$[*]' COLUMNS (seg LONGTEXT PATH '$.segment')) AS jt; -
Combine the retrieved table schema and citations as the final context.
mysql> SET @final_ctx = CONCAT(@ctx, '\n\nRetrieved tables:\n', @retrieved); -
Use
ML_GENERATEto specify an in-database LLM (llama3.2-3b-instruct-v1) and manually create a SQL statement that includes the citations, context, and retrieved table schema.mysql> SELECT sys.ML_GENERATE( @input, JSON_OBJECT( "task", "generation", "model_id", "llama3.2-3b-instruct-v1", "context", @final_ctx ) ) INTO @result; -
Generate the output and SQL text.
mysql> SELECT JSON_UNQUOTE(JSON_EXTRACT(@result,'$.text')) AS generated_sql; generated_sql To train a model to predict the net worth of a singer, you can use the ML_TRAIN routine. First, prepare your dataset, which in this case seems to be the 'singer' table in the 'mlcorpus' schema. Ensure that the table has the necessary columns, such as 'Singer_ID', 'Name', 'Birth_Year', 'Net_Worth_Millions', and 'Citizenship'. The 'Net_Worth_Millions' column will be your target column for prediction. You may need to preprocess your data, for example, converting categorical variables like 'Name' and 'Citizenship' into numerical variables if necessary. Then, you can call the ML_TRAIN routine with the appropriate options. For a regression task like predicting net worth, you would specify the task as 'regression' in the JSON options. Here's a simplified example: ```sql CALL sys.ML_TRAIN('mlcorpus.singer', @model_handle, 'Net_Worth_Millions', JSON_OBJECT('task', 'regression', 'algorithm', 'XGBRegressor')); ``` Replace '@model_handle' with your actual model handle variable. This will train a model to predict the 'Net_Worth_Millions' based on the other columns in your 'singer' table. After training, you can use the ML_PREDICT_ROW or ML_PREDICT_TABLE routine to generate predictions for new, unseen data.
Review Machine Learning Use Cases.
Review the syntax and examples for the
NL2MLroutine.