Documentation Home
MySQL AI 9.6
Related Documentation Download this Manual
PDF (US Ltr) - 1.5Mb
PDF (A4) - 1.5Mb


MySQL AI 9.6  /  Training and Using Machine Learning Models  /  Learn About MySQL AI AutoML with NL2ML

4.6 Learn About MySQL AI AutoML with NL2ML

The NL2ML routine enables you to learn about MySQL AI AutoML by providing relevant citations to MySQL AI documentation. You can also leverage the ML_GENERATE routine or an MCP server with external LLMs to generate AutoML queries you can copy and run.

This topic has the following sections.

Before You Begin

Load MySQL AI Documentation

Before using this feature, you must load the appropriate version of the MySQL AI documentation into the MySQL AI directory defined by secure_file_priv.

To load the documentation:

  1. In the top-right corner of this page, make sure that the correct verion of MySQL AI is selected.

  2. In the bottom-left corner of this page, click the link to download the PDF version of the documentation.

  3. Rename the downloaded PDF file to mysql_ai_en.pdf.

  4. Log into your MySQL AI instance and upload the PDF file to the MySQL AI directory defined by secure_file_priv. Ensure that the file has the appropriate read access for all users, so that MySQL AI can read the file.

    If you do not know the appropriate directory, you can run the following command:

    mysql> SELECT @@secure_file_priv

    See LOAD DATA Statement.

Use NL2ML with In-Database LLMs

To use NL2ML to provide citations to MySQL HeatWave documentation, and then leverage in-database LLMs to generate responses that include appropriate table schemas and commands, do the following:

  • Set the skip_generate option to true with the @nl2ml_options session variable.

  • Use ML_RETRIEVE_SCHEMA_METADATA to retrieve the table schema related to the question asked during the NL2ML routine.

  • Use GROUP_CONCAT() to build a compact context string from the citations provided by NL2ML.

  • Use ML_GENERATE to specify the in-database LLM and generate the response to the question, which includes the citations, context, and retrieved table schema.

See the following example.

To use in-database LLMs with NL2ML:

  1. Specify the question and set it into a variable (@input).

    mysql> SET @input = "How can I train a model to predict net worth of a singer?";
  2. Set the skip_generate option to true with the @nl2ml_options session variable.

    mysql> SET @nl2ml_options = JSON_OBJECT("skip_generate", true);
  3. Run the NL2ML routine and include the previous variable that has the question.

    mysql> CALL sys.NL2ML(@input, @out);
  4. View the output generated from the question by selecting the @out variable.

    mysql> SELECT JSON_PRETTY(@out);
    JSON_PRETTY(@out)
    {
      "citations": [
        {
          "segment": "<segment content>",
          "distance": 0.1023,
          "document_name": <mysql_ai_en.pdf>,
          "segment_number": <segment number>
        },
        ...
      ],
      "retrieval_info": {
        "method": "n_citations",
        "threshold": 0.114
      }
    }

    The output includes citations with the following information:

    • segment: The relevant excerpts from the MySQL AI documentation.

    • distance: A value indicating how relevant the segment is to the question asked. A lower value represents a more relevant segment.

    • document_name: A reference to the MySQL AI documentation.

    • segment_number: The index number identifying the segment.

  5. Use ML_RETRIEVE_SCHEMA_METADATA to retrieve the most relevant table schema for the previous question.

    mysql> CALL sys.ML_RETRIEVE_SCHEMA_METADATA(@input, @retrieved, NULL);
  6. Retrieve the table schema from the @retrieved variable.

    mysql> SELECT @retrieved;
    @retrieved
    CREATE TABLE `mlcorpus`.`singer`(
      `Singer_ID` int,
      `Name` varchar,
      `Birth_Year` double,
      `Net_Worth_Millions` double,
      `Citizenship` varchar
    );
  7. Use GROUP_CONCAT() to build a compact context string from the citations provided by NL2ML.

    mysql> SELECT GROUP_CONCAT(seg SEPARATOR '\n') INTO @ctx
    FROM JSON_TABLE(JSON_EXTRACT(@out,'$.citations'),
    '$[*]' COLUMNS (seg LONGTEXT PATH '$.segment')) AS jt;
  8. Combine the retrieved table schema and citations as the final context.

    mysql> SET @final_ctx = CONCAT(@ctx, '\n\nRetrieved tables:\n', @retrieved);
  9. Use ML_GENERATE to specify an in-database LLM (llama3.2-3b-instruct-v1) and manually create a SQL statement that includes the citations, context, and retrieved table schema.

    mysql> SELECT sys.ML_GENERATE(
    @input,
    JSON_OBJECT(
    "task", "generation",
    "model_id", "llama3.2-3b-instruct-v1",
    "context", @final_ctx
    )
    ) INTO @result;
  10. Generate the output and SQL text.

    mysql> SELECT JSON_UNQUOTE(JSON_EXTRACT(@result,'$.text')) AS generated_sql;
    generated_sql
    To train a model to predict the net worth of a singer, you can use the ML_TRAIN routine. First, prepare your dataset, 
    which in this case seems to be the 'singer' table in the 'mlcorpus' schema. Ensure that the table has the necessary columns, 
    such as 'Singer_ID', 'Name', 'Birth_Year', 'Net_Worth_Millions', and 'Citizenship'. 
    
    The 'Net_Worth_Millions' column will be your target column for prediction. You may need to preprocess your data, 
    for example, converting categorical variables like 'Name' and 'Citizenship' into numerical variables if necessary.
    
    Then, you can call the ML_TRAIN routine with the appropriate options. For a regression task like predicting net worth, 
    you would specify the task as 'regression' in the JSON options. Here's a simplified example:
    
    ```sql
    CALL sys.ML_TRAIN('mlcorpus.singer', 
                      @model_handle, 
                      'Net_Worth_Millions', 
                      JSON_OBJECT('task', 'regression', 
                                  'algorithm', 'XGBRegressor'));
    ```
    
    Replace '@model_handle' with your actual model handle variable. This will train a model to predict the 'Net_Worth_Millions' 
    based on the other columns in your 'singer' table. After training, you can use the ML_PREDICT_ROW or ML_PREDICT_TABLE routine 
    to generate predictions for new, unseen data.

What's Next