MySQL HeatWave User Guide  /  HeatWave ML

Chapter 3 HeatWave ML

HeatWave ML makes it easy to use machine learning, whether you are a novice user or an experienced ML practitioner. You provide the data, and HeatWave ML analyzes the characteristics of the data and creates an optimized machine learning model that you can use to generate predictions and explanations. An ML model makes predictions by identifying patterns in your data and applying those patterns to unseen data. HeatWave ML explanations help you understand how predictions are made, such as which features of a dataset contribute most to a prediction.

Supervised Learning

HeatWave ML supports supervised machine learning. That is, it creates a machine learning model by analyzing a labeled dataset to learn patterns that enable it to predict labels based on the features of the dataset. For example, this guide uses the Census Income Data Set in its examples, where features such as age, education, occupation, country, and so on, are used to predict an individual's income (the label).

Once a model is created, it can be used on unseen data, where the label is unknown, to make predictions. In a business setting, predictive models have a variety of possible applications such as predicting customer churn, approving or rejecting credit applications, predicting customer wait times, and so on.

HeatWave ML supports both classification and regression models. A classification model predicts discrete values, such as whether an email is spam or not, whether a loan application should be approved or rejected, or what product a customer might be interested in purchasing. A regression model predicts continuous values, such as customer wait times, expected sales, or home prices, for example. The model type is selected during training, with classification being the default type.

Ease of Use

HeatWave ML is purpose-built for ease of use. It requires no machine learning expertise, specialized tools, or algorithms. With HeatWave ML and a set of training data, you can train a predictive machine learning (ML) model with a single call to the ML_TRAIN SQL routine; for example:

CALL sys.ML_TRAIN('heatwaveml_bench.census_train', 'revenue', NULL, @census_model);

The ML_TRAIN routine leverages Oracle AutoML technology to automate training of machine learning models. For information about Oracle AutoML, see Oracle AutoML.

You can use a model created by ML_TRAIN with other HeatWave ML routines to generate predictions and explanations; for example, this call to the ML_PREDICT_TABLE routine generates predictions for a table of input data:

CALL sys.ML_PREDICT_TABLE('heatwaveml_bench.census_test', @census_model, 
'heatwaveml_bench.census_predictions');

All HeatWave ML operations are initiated by running CALL or SELECT statements, which can be easily integrated into your applications. HeatWave ML routines reside in the MySQL sys schema and can be run from any MySQL client or application that is connected to a DB System with a HeatWave Cluster. HeatWave ML routines include:

In addition, with HeatWave ML, there is no need to move or reformat your data. Data and machine learning models never leave the MySQL Database Service, which saves you time and effort while keeping your data and models secure.

HeatWave ML Workflow

The general HeatWave ML workflow is described below:

  1. When the ML_TRAIN routine is called, HeatWave ML calls the MySQL DB System where the training data resides. The training data is sent from the MySQL DB System and distributed across the HeatWave Cluster, which performs machine learning computation in parallel. See Section 3.4, “Training a Model”.

  2. HeatWave ML analyzes the training data, trains an optimized machine learning model, and stores the model in a model catalog on the MySQL DB System. See Section 3.9.1, “The Model Catalog”.

  3. HeatWave ML ML_PREDICT_* and ML_EXPLAIN_* routines use the trained model to generate predictions and explanations on test or unseen data. See Section 3.6, “Predictions”, and Section 3.7, “Explanations”.

  4. Predictions and explanations are returned to the DB System and to the user or application that issued the query.

Optionally, the ML_SCORE routine can be used to compute the quality of a model to ensure that predictions and explanations are reliable. See Section 3.9.6, “Scoring Models”.

Note

HeatWave ML shares resources with HeatWave. HeatWave analytics queries are given priority over HeatWave ML queries. Concurrent HeatWave analytics and HeatWave ML queries are not supported. A HeatWave ML query must wait for HeatWave analytics queries to finish, and vice versa.

Oracle AutoML

The HeatWave ML ML_TRAIN routine leverages Oracle AutoML technology to automate the process of training a machine learning model. Oracle AutoML replaces the laborious and time consuming tasks of the data analyst whose workflow is as follows:

  1. Selecting a model from a large number of viable candidate models.

  2. For each model, tuning hyperparameters.

  3. Selecting only predictive features to speed up the pipeline and reduce over-fitting.

  4. Ensuring the model performs well on unseen data (also called generalization).

Oracle AutoML automates this workflow, providing you with an optimal model given a time budget. The Oracle AutoML pipeline used by the HeatWave ML ML_TRAIN routine has these stages:

  • Data preprocessing

  • Algorithm selection

  • Adaptive data reduction

  • Hyperparameter optimization

  • Model and prediction explanations

Figure 3.1 Oracle AutoML Pipeline

Image showing the Oracle AutoML pipeline.

Oracle AutoML also produces high quality models very efficiently, which is achieved through a scalable design and intelligent choices that reduce trials at each stage in the pipeline.

  • Scalable design: The Oracle AutoML pipeline is able to exploit both HeatWave internode and intranode parallelism, which improves scalability and reduces runtime.

  • Intelligent choices reduce trials in each stage: Algorithms and parameters are chosen based on dataset characteristics, which ensures that the model is accurate and efficiently selected. This is achieved using meta-learning throughout the pipeline.

For additional information about Oracle AutoML, refer to Yakovlev, Anatoly, et al. "Oracle automl: a fast and predictive automl pipeline." Proceedings of the VLDB Endowment 13.12 (2020): 3166-3180.