WL#4292: Pushdown of query fragments to storage engines (QFPD)
Affects: Server-7.x
—
Status: Un-Assigned
------------------------------------------------------------------------------- Table of Contents ------------------------------------------------------------------------------- 1. Technical case 2. Expected benefits 1. Technical case ------------------------------------------------------------------------------- The MySQL pluggable Storage Engine (SE) architecture enables the MySQL server to serve as a SQL query processor front-end for storage engines with highly varying processing capabilities. Some SEs provide only table/index scan capabilities or single-table filters, others may provide more powerful query execution primitives such as joins, grouping and aggregation, but no optimization. Finally, other SEs are full-fledged SQL query processors that are capable of optimizing and executing a subset of SQL, or complete SQL. Many of the already existing SEs are capable of performing various database operations orders of magnitude faster than the MySQL server based on specialized algorithms and hardware, however, due to its limited SE API, currently it is very hard to make the MySQL server utilize these capabilities. The current SE API (primarily the handler interface) provides a fixed set of data access primitives such as table/index scan, index range scans, index lookups and few more. This interface limits the ways in which the MySQL server can use the capabilities of each SE. For instance even if several tables are stored in the same SE, and this SE is capable of efficient parallel join execution, the current SE API doesn't allow the server to utilize this capability, because it enforces table-by-table access through a fixed set of operations. In order to use the advanced capabilities of an SE, it is necessary to somehow instruct it perform the arbitrarily complex computations it is capable of executing more efficiently than the MySQL server. The goal of this task is to enable the MySQL server and its SEs to communicate and agree on the coordinated execution of arbitrarily complex computations for the execution of SQL queries. Given that in an SQL DBMS computations are specified either declaratively via queries, or via query execution plans (still equivalent to a query), in the scope of this work, the computations exchanged by the server and its SEs will be called "Query Fragments". The process of instructing an SE to execute some query fragment is called "Pushdown", hence the name of the task. There are two main subgoals: * provide an API to communicate query fragments between the server and the SE, and * to improve the MySQL optimizer so that it can plan query execution in a way that can utilize the newly exposed capabilities of the SEs, and * to improve the MySQL executioner so that it can execute efficiently such plans. 2. Expected benefits ------------------------------------------------------------------------------- The implementation of this task would result in two major benefits: * it will enable MySQL and it Storage Engine providers to deliver up to orders of magnitude higher performance and scalability, utilizing all the knowledge of the community invested in building high-performance or otherwise specialized SEs, and * it will make it possible to build multi-database systems (a special case of which are federated systems) for the efficient online integration of heterogeneous data sources.
WL#4533: QFPD: Abstract Query tree (AQT)
WL#4535: QFPD: Storage Engine Pushdown API (SEPD API)
WL#4536: QFPD: Query decomposition and optimization for query fragments
WL#4537: QFPD: Virtual Tables
WL#4535: QFPD: Storage Engine Pushdown API (SEPD API)
WL#4536: QFPD: Query decomposition and optimization for query fragments
WL#4537: QFPD: Virtual Tables
------------------------------------------------------------------------------- Table of Contents ------------------------------------------------------------------------------- 1. Decomposition into sub-tasks. 2. Overview of query processing with query fragment pushdown. 3. Example scenarios. 1. Decomposition into sub-tasks. ------------------------------------------------------------------------------- The implementation of QFPD is decomposed into the following main tasks, which will be implemented roughly in the order below. It will hardly be possible do develop all tasks in a sequence, because there are dependencies. The subtasks are ordered with one goal in mind - to quickly build a prototype that allows pushdown and execution of a subset of all possible fragments, specifically of query fragments with joins. 1. WL#4533: Abstract Query Tree (AQT) Goal: Provide the backbone for QFPD, namely the abstraction of query fragments by which the MySQL server and SEs can communicate query fragments, and agree on their execution. Status: - in progress - complete initial design ready (HLS/LLD), however it is out of date, - currently in the process of prototyping. 2. WL#4535: Storage Engine Pushdown API (SEPD API) Goal: Implement an API throught which query fragments can be communicated to SEs, and SEs can communicate back with the server both during query optimization and query execution. Status: - waits for WL#4533 - HLD written 3. WL#4537: Virtual Tables Goal: Implement virtual tables through which SEs will send the result stream from pushed query fragments into the server. Once a query fragment is pushed to an SE, the query fragment is substituted by a virtual table that logically contains the result of the fragment. Physically, the virtual table will serve as the interface through which an SE communicates the result data stream from executing a fragment to the server. Status: - HLD written, - WL#3288 is supposed to implement the main part of the functionality, - waits for WL#4533, WL#4535. 4. WL#4536: Query decomposition and optimization with query fragments Goal: Extend the query optimizer to support optimal decomposition of queries into fragments via negotiation with the SEs, and generation of a final reduced query that composes the result streams from all fragments into a final query result. This is the most complex task of all, and it will be implemented in several stages (to be defined). The first stage is to simply try to push a complete query plan as a fragment, and to substitute this fragment with a Virtual Table (from WL#4537). Status: - HLD written, - done some initial literature research, - waits for WL#4533, WL#4535, WL#4537, (WL#4553 for a complete solution). 5. WL#4553: Cost model calibration Goal: Storage engines may use completely different cost models and cost units. This task makes it possible to compare the cost estimates of SEs and the server estimates. Status: - HLD written, - done some initial literature research. 2. Overview of query processing with query fragment pushdown. ------------------------------------------------------------------------------- TODO 3. Example scenarios. ------------------------------------------------------------------------------- TODO
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.