The HeatWave AutoML ML_TRAIN
routine
leverages Oracle AutoML technology to automate the process of
training a machine learning model. Oracle AutoML replaces the
laborious and time consuming tasks of the data analyst whose
workflow is as follows:
Selecting a model from a large number of viable candidate models.
For each model, tuning hyperparameters.
Selecting only predictive features to speed up the pipeline and reduce over-fitting.
Ensuring the model performs well on unseen data (also called generalization).
Oracle AutoML automates this workflow, providing you with an
optimal model given a time budget. The Oracle AutoML pipeline used
by the HeatWave AutoML ML_TRAIN
routine has these stages:
Data preprocessing
Algorithm selection
Adaptive data reduction
Hyperparameter optimization
Model and prediction explanations
Oracle AutoML also produces high quality models very efficiently, which is achieved through a scalable design and intelligent choices that reduce trials at each stage in the pipeline.
Scalable design: The Oracle AutoML pipeline is able to exploit both HeatWave internode and intranode parallelism, which improves scalability and reduces runtime.
Intelligent choices reduce trials in each stage: Algorithms and parameters are chosen based on dataset characteristics, which ensures that the model is accurate and efficiently selected. This is achieved using meta-learning throughout the pipeline.
For additional information about Oracle AutoML, refer to Yakovlev, Anatoly, et al. "Oracle AutoML: A Fast and Predictive AutoML Pipeline." Proceedings of the VLDB Endowment 13.12 (2020): 3166-3180.