Table of Contents
- 3.1 Before You Begin
- 3.2 Getting Started
- 3.3 Preparing Data
- 3.4 Training a Model
- 3.5 Predictions
- 3.6 Explanations
- 3.7 Managing Models
- 3.8 HeatWave ML Routines
- 3.9 Supported Data Types
- 3.10 Monitoring
- 3.11 HeatWave ML Error Messages
- 3.12 Limitations
HeatWave ML makes it easy to use machine learning, whether you are a novice user or an experienced ML practitioner. You provide the data, and HeatWave ML analyzes the characteristics of the data and creates an optimized machine learning model that you can use to generate predictions and explanations. An ML model makes predictions by identifying patterns in your data and applying those patterns to unseen data. HeatWave ML explanations help you understand how predictions are made, such as which features of a dataset contribute most to a prediction.
HeatWave ML supports supervised machine learning. That is, it creates a machine learning model by analyzing a labeled dataset to learn patterns that enable it to predict labels based on the features of the dataset. For example, this guide uses the Census Income Data Set in its examples, where features such as age, education, occupation, country, and so on, are used to predict an individual's income (the label).
Once a model is created, it can be used on unseen data, where the label is unknown, to make predictions. In a business setting, predictive models have a variety of possible applications such as predicting customer churn, approving or rejecting credit applications, predicting customer wait times, and so on.
HeatWave ML supports both classification and regression models. A classification model predicts discrete values, such as whether an email is spam or not, whether a loan application should be approved or rejected, or what product a customer might be interested in purchasing. A regression model predicts continuous values, such as customer wait times, expected sales, or home prices, for example. The model type is selected during training, with classification being the default type.
HeatWave ML is purpose-built for ease of use. It requires no
machine learning expertise, specialized tools, or algorithms. With
HeatWave ML and a set of training data, you can train a predictive
machine learning (ML) model with a single call to the
SQL routine; for example:
CALL sys.ML_TRAIN('heatwaveml_bench.census_train', 'revenue', NULL, @census_model);
You can use a model created by
with other HeatWave ML routines to generate predictions and
explanations; for example, this call to the
routine generates predictions for a table of input data:
CALL sys.ML_PREDICT_TABLE('heatwaveml_bench.census_test', @census_model, 'heatwaveml_bench.census_predictions');
All HeatWave ML operations are initiated by running
SELECT statements, which can be
easily integrated into your applications. HeatWave ML routines
reside in the MySQL
sys schema and can be run
from any MySQL client or application that is connected to a DB
System with a HeatWave Cluster. HeatWave ML routines include:
ML_TRAIN: Trains a machine learning model for a given training dataset.
ML_PREDICT_ROW: Makes predictions for one or more rows of data.
ML_PREDICT_TABLE: Makes predictions for a table of data.
ML_EXPLAIN_ROW: Explains predictions for one or more rows of data.
ML_EXPLAIN_TABLE: Explains predictions for a table of data.
ML_SCORE: Computes the quality of a model.
ML_MODEL_LOAD: Loads a machine learning model for predictions and explanations.
ML_MODEL_UNLOAD: Unloads a machine learning model.
In addition, with HeatWave ML, there is no need to move or reformat your data. Data and machine learning models never leave the MySQL Database Service, which saves you time and effort while keeping your data and models secure.
The general HeatWave ML workflow is described below:
ML_TRAINroutine is called, HeatWave ML calls the MySQL DB System where the training data resides. The training data is sent from the MySQL DB System and distributed across the HeatWave Cluster, which performs machine learning computation in parallel. See Section 3.4, “Training a Model”.
HeatWave ML analyzes the training data, trains an optimized machine learning model, and stores the model in a model catalog on the MySQL DB System. See Section 3.7.1, “The Model Catalog”.
ML_EXPLAIN_*routines use the trained model to generate predictions and explanations on test or unseen data. See Section 3.5, “Predictions”, and Section 3.6, “Explanations”.
Predictions and explanations are returned to the DB System and to the user or application that issued the query.
HeatWave ML shares resources with HeatWave. HeatWave analytics queries are given priority over HeatWave ML queries. Concurrent HeatWave analytics and HeatWave ML queries are not supported. A HeatWave ML query must wait for HeatWave analytics queries to finish, and vice versa.
The HeatWave ML
routine leverages Oracle AutoML technology to automate the process
of training a machine learning model. Oracle AutoML replaces the
laborious and time consuming tasks of the data analyst whose
workflow is as follows:
Selecting a model from a large number of viable candidate models.
For each model, tuning hyperparameters.
Selecting only predictive features to speed up the pipeline and reduce over-fitting.
Ensuring the model performs well on unseen data (also called generalization).
Oracle AutoML automates this workflow, providing you with an
optimal model given a time budget. The Oracle AutoML pipeline used
by the HeatWave ML
routine has these stages:
Adaptive data reduction
Model and prediction explanations
Oracle AutoML also produces high quality models very efficiently, which is achieved through a scalable design and intelligent choices that reduce trials at each stage in the pipeline.
Scalable design: The Oracle AutoML pipeline is able to exploit both HeatWave internode and intranode parallelism, which improves scalability and reduces runtime.
Intelligent choices reduce trials in each stage: Algorithms and parameters are chosen based on dataset characteristics, which ensures that the model is accurate and efficiently selected. This is achieved using meta-learning throughout the pipeline.
For additional information about Oracle AutoML, refer to Yakovlev, Anatoly, et al. "Oracle automl: a fast and predictive automl pipeline." Proceedings of the VLDB Endowment 13.12 (2020): 3166-3180.