Documentation Home
MySQL HeatWave User Guide
Related Documentation Download this Manual
PDF (US Ltr) - 1.0Mb
PDF (A4) - 1.0Mb

MySQL HeatWave User Guide  /  Overview

Chapter 1 Overview

HeatWave is a massively parallel, high performance, in-memory query accelerator that accelerates MySQL performance by orders of magnitude for analytics workloads, mixed workloads, and machine learning.

A HeatWave Cluster consists of a MySQL DB System and HeatWave nodes. The MySQL DB System includes a HeatWave plugin that is responsible for cluster management, query scheduling, and returning query results to the MySQL DB System. HeatWave nodes store data in memory and process analytics and machine learning queries. Each HeatWave node hosts an instance of the HeatWave query processing engine (RAPID).

When you enable a HeatWave Cluster, analytics queries that meet certain prerequisites are automatically offloaded from the MySQL DB System to the HeatWave Cluster for accelerated processing, enabling you to run online transaction processing (OLTP), online analytical processing (OLAP), and mixed workloads from the same MySQL database without requiring extract, transfer, and load (ETL), and without modifying your applications. For more information about HeatWave's analytics capabilities, see Chapter 2, HeatWave.

Enabling a HeatWave Cluster also provides access to HeatWave Machine Learning (ML), which is a fully managed, highly scalable, cost-efficient, machine learning solution for data stored in MySQL. HeatWave ML provides a simple SQL interface for training and using predictive machine learning models, which can be used by novice and experienced ML practitioners alike. Machine learning expertise, specialized tools, and algorithms are not required. With HeatWave ML, you can train a model with a single call to an SQL routine. Similarly, you can generate predictions with a single CALL or SELECT statement which can be easily integrated with your applications.

With HeatWave ML, data and models never leave the MySQL Database Service, saving you time and effort while keeping your data and models secure. HeatWave ML is optimized for HeatWave shapes and scaling, and all HeatWave ML processing is performed on the HeatWave Cluster. ML computation is distributed among HeatWave nodes, taking advantage of HeatWave's scalability and massively parallel processing capabilities. For more information about HeatWave's machine learning capabilities, see Chapter 3, HeatWave ML.

Analytics and machine learning queries are issued from a MySQL client or application that interacts with the HeatWave Cluster by connecting to the MySQL DB System. Results are returned to the MySQL DB System and to the MySQL client or application that issued the query.

The number of HeatWave nodes required depends on data size and the amount of compression that is achieved when loading data into the HeatWave Cluster. A HeatWave Cluster supports up to 64 nodes.

Data that is loaded into HeatWave is automatically persisted to OCI Object Storage for fast recovery in case of a HeatWave Cluster failure.

HeatWave network traffic is fully encrypted.

Figure 1.1 HeatWave Architecture

Content is described in the surrounding text.

HeatWave Architectural Features

In-Memory Hybrid-Columnar Format

HeatWave stores data in main memory in a hybrid columnar format. HeatWave's hybrid approach achieves the benefits of columnar format for query processing, while avoiding the materialization and update costs associated with pure columnar format. Hybrid columnar format enables the use of efficient query processing algorithms designed to operate on fixed-width data, and permits vectorized query processing.

Massively Parallel Architecture

HeatWave's massively parallel architecture is enabled by internode and intranode partitioning of data. Each node within a HeatWave Cluster, and each CPU core within a node, processes the partitioned data in parallel. HeatWave is capable of scaling to thousands of cores. This massively parallel architecture, combined with high-fanout, workload-aware partitioning, accelerates query processing.

Figure 1.2 HeatWave Massively Parallel Architecture

Content is described in the surrounding text.

Push-Based Vectorized Query Processing

HeatWave processes queries by pushing vector blocks (slices of columnar data) through the query execution plan from one operator to another. A push-based execution model avoids deep call stacks and saves valuable resources compared to tuple-based processing models.

Scale-Out Data Management

When analytics data is loaded into HeatWave, the HeatWave Storage Layer automatically persists the data to OCI Object Storage for fast recovery in case of a HeatWave node or cluster failure. Data is automatically restored by the HeatWave Storage Layer when HeatWave recovers a failed node or cluster. This automated, self-managing storage layer scales to the size required for your HeatWave Cluster and operates independently in the background. The time required to reload data is constant regardless of data size or HeatWave Cluster size.

Native MySQL Integration

Native integration with MySQL provides a single data management platform for OLTP, OLAP, mixed workloads, and machine learning. HeatWave is designed as a pluggable MySQL storage engine, which enables management of both the MySQL and HeatWave using the same interfaces.

Changes to analytics data on the MySQL DB System are automatically propagated to HeatWave nodes in real time, which means that queries always have access to the latest data. Change propagation is performed automatically by a light-weight algorithm.

Users and applications interact with HeatWave through the MySQL DB System using standard tools and standard-based ODBC/JDBC connectors. HeatWave supports the same ANSI SQL standard and ACID properties as MySQL and the most commonly used data types. This support enables existing applications to use HeatWave without modification, allowing for quick and easy integration.

MySQL Autopilot

MySQL Autopilot automates many of the most important and often challenging aspects of achieving exceptional query performance at scale, including cluster provisioning, loading data, query processing, and failure handling. It uses advanced techniques to sample data, collect statistics on data and queries, and build machine learning models to model memory usage, network load, and execution time. The machine learning models are used by MySQL Autopilot to execute its core capabilities. MySQL Autopilot makes the HeatWave query optimizer increasingly intelligent as more queries are executed, resulting in continually improving system performance.

Autopilot focuses on four aspects of the HeatWave service life cycle:

System Setup

  • Auto Provisioning

    Estimates the number of HeatWave nodes required by sampling the data, which means that manual cluster size estimations are not necessary. See HeatWave Cluster Size Estimates.

Data Load

Query Execution

  • Auto Query Plan Improvement

    Uses statistics from previously executed queries to improve future query execution plans. See Auto Query Plan Improvement.

  • Auto Query Time Estimation

    Estimates query execution time, allowing you to determine how a query might perform without having to run the query. Runtime estimates are provided by the Advisor Query Insights feature. See Section, “Query Insights”.

  • Auto Change Propagation

    Auto Change Propagation intelligently determines the optimal time when changes to data on the MySQL DB System should be propagated to the HeatWave Storage Layer.

  • Auto Scheduling

    Prioritizes queries in an intelligent way to reduce overall query execution wait times. See Auto Scheduling.

Failure Handling

  • Auto Error Recovery

    Auto Error Recovery provisions new HeatWave nodes and reloads data from the HeatWave storage layer if one or more HeatWave nodes becomes unresponsive due to a software or hardware failure. See HeatWave Cluster Failure and Recovery.