HeatWave is a massively parallel, high performance, in-memory query accelerator that accelerates MySQL performance by orders of magnitude for analytics workloads, mixed workloads, and machine learning.
A HeatWave Cluster consists of a MySQL DB System and HeatWave nodes.
The MySQL DB System includes a HeatWave plugin that is responsible
for cluster management, query scheduling, and returning query
results to the MySQL DB System. HeatWave nodes store data in
memory and process analytics and machine learning queries. Each
HeatWave node hosts an instance of the HeatWave query processing engine
When you enable a HeatWave Cluster, analytics queries that meet certain prerequisites are automatically offloaded from the MySQL DB System to the HeatWave Cluster for accelerated processing, enabling you to run online transaction processing (OLTP), online analytical processing (OLAP), and mixed workloads from the same MySQL database without requiring extract, transfer, and load (ETL), and without modifying your applications. For more information about HeatWave's analytics capabilities, see Chapter 2, HeatWave.
Enabling a HeatWave Cluster also provides access to HeatWave Machine Learning (ML), which
is a fully managed, highly scalable, cost-efficient, machine
learning solution for data stored in MySQL. HeatWave ML provides a
simple SQL interface for training and using predictive machine
learning models, which can be used by novice and experienced ML
practitioners alike. Machine learning expertise, specialized
tools, and algorithms are not required. With HeatWave ML, you can
train a model with a single call to an SQL routine. Similarly, you
can generate predictions with a single
SELECT statement which can be easily integrated
with your applications.
With HeatWave ML, data and models never leave the MySQL Database Service, saving you time and effort while keeping your data and models secure. HeatWave ML is optimized for HeatWave shapes and scaling, and all HeatWave ML processing is performed on the HeatWave Cluster. ML computation is distributed among HeatWave nodes, taking advantage of HeatWave's scalability and massively parallel processing capabilities. For more information about HeatWave's machine learning capabilities, see Chapter 3, HeatWave ML.
Analytics and machine learning queries are issued from a MySQL client or application that interacts with the HeatWave Cluster by connecting to the MySQL DB System. Results are returned to the MySQL DB System and to the MySQL client or application that issued the query.
The number of HeatWave nodes required depends on data size and the amount of compression that is achieved when loading data into the HeatWave Cluster. A HeatWave Cluster supports up to 64 nodes.
Data that is loaded into HeatWave is automatically persisted to OCI Object Storage for fast recovery in case of a HeatWave Cluster failure.
HeatWave network traffic is fully encrypted.
HeatWave stores data in main memory in a hybrid columnar format. HeatWave's hybrid approach achieves the benefits of columnar format for query processing, while avoiding the materialization and update costs associated with pure columnar format. Hybrid columnar format enables the use of efficient query processing algorithms designed to operate on fixed-width data, and permits vectorized query processing.
HeatWave's massively parallel architecture is enabled by internode and intranode partitioning of data. Each node within a HeatWave Cluster, and each CPU core within a node, processes the partitioned data in parallel. HeatWave is capable of scaling to thousands of cores. This massively parallel architecture, combined with high-fanout, workload-aware partitioning, accelerates query processing.
HeatWave processes queries by pushing vector blocks (slices of columnar data) through the query execution plan from one operator to another. A push-based execution model avoids deep call stacks and saves valuable resources compared to tuple-based processing models.
When analytics data is loaded into HeatWave, the HeatWave Storage Layer automatically persists the data to OCI Object Storage for fast recovery in case of a HeatWave node or cluster failure. Data is automatically restored by the HeatWave Storage Layer when HeatWave recovers a failed node or cluster. This automated, self-managing storage layer scales to the size required for your HeatWave Cluster and operates independently in the background. The time required to reload data is constant regardless of data size or HeatWave Cluster size.
Native integration with MySQL provides a single data management platform for OLTP, OLAP, mixed workloads, and machine learning. HeatWave is designed as a pluggable MySQL storage engine, which enables management of both the MySQL and HeatWave using the same interfaces.
Changes to analytics data on the MySQL DB System are automatically propagated to HeatWave nodes in real time, which means that queries always have access to the latest data. Change propagation is performed automatically by a light-weight algorithm.
Users and applications interact with HeatWave through the MySQL DB System using standard tools and standard-based ODBC/JDBC connectors. HeatWave supports the same ANSI SQL standard and ACID properties as MySQL and the most commonly used data types. This support enables existing applications to use HeatWave without modification, allowing for quick and easy integration.
MySQL Autopilot automates many of the most important and often challenging aspects of achieving exceptional query performance at scale, including cluster provisioning, loading data, query processing, and failure handling. It uses advanced techniques to sample data, collect statistics on data and queries, and build machine learning models to model memory usage, network load, and execution time. The machine learning models are used by MySQL Autopilot to execute its core capabilities. MySQL Autopilot makes the HeatWave query optimizer increasingly intelligent as more queries are executed, resulting in continually improving system performance.
Autopilot focuses on four aspects of the HeatWave service life cycle:
Estimates the number of HeatWave nodes required by sampling the data, which means that manual cluster size estimations are not necessary. See HeatWave Cluster Size Estimates.
Auto Parallel Load
Optimizes load time and memory usage by predicting the optimal degree of parallelism for each table loaded into HeatWave. See Section 2.2.3, “Loading Data Using Auto Parallel Load”.
Determines the optimal encoding for string column data, which minimizes the required cluster size and improves query performance. See Section 188.8.131.52, “Auto Encoding”.
Auto Data Placement
Recommends how tables should be partitioned in memory to achieve the best query performance, and estimates the expected performance improvement. See Section 184.108.40.206, “Auto Data Placement”.
Auto Query Plan Improvement
Uses statistics from previously executed queries to improve future query execution plans. See Auto Query Plan Improvement.
Auto Query Time Estimation
Estimates query execution time, allowing you to determine how a query might perform without having to run the query. Runtime estimates are provided by the Advisor Query Insights feature. See Section 220.127.116.11, “Query Insights”.
Auto Change Propagation
Auto Change Propagation intelligently determines the optimal time when changes to data on the MySQL DB System should be propagated to the HeatWave Storage Layer.
Prioritizes queries in an intelligent way to reduce overall query execution wait times. See Auto Scheduling.
Auto Error Recovery
Auto Error Recovery provisions new HeatWave nodes and reloads data from the HeatWave storage layer if one or more HeatWave nodes becomes unresponsive due to a software or hardware failure. See HeatWave Cluster Failure and Recovery.