WL#1763: Avoid creating temporary table in UNION ALL
Status: Complete
Currently, union queries always use a temporary table to store the result before it is returned to the user. This worklog is about avoiding creating a temporary table for the result of UNION ALL when there is no need for it, i.e., when there is no top-level ORDER BY. This will save the cost of creating, writing to and reading from the temporary table, which may be put on disk if the result is large. This will also reduce the need for disk space for large unions, and the client will get the first rows quicker. Bug#11758470 DO NOT CREATE TEMPORARY TABLES FOR UNION ALL RB#2651 User Documentation ================== http://dev.mysql.com/doc/refman/5.7/en/internal-temporary-tables.html http://dev.mysql.com/doc/relnotes/mysql/5.7/en/news-5-7-3.html
Description ----------- Currently, a temporary table is always created to hold the result of a union query. If the union is a UNION ALL rather than UNION DISTINCT, there is no filtering of duplicates. If the UNION also doesn't have an ORDER BY clause, there is no need for a temporary table, and the result can be sent directly to the client. Since the queries in the union may have different types in the same column, some values may have to be converted. Currently, this is done when writing to and reading from the temporary table. When skipping the temporary table, all queries must either have the same column types, or types must be converted behind the scenes so that the query result has one type for each column. Skipping temporary tables ------------------------- Setting up a temporary table is a two-stage process: 1) Creating - Allocate a TABLE structure and fill it - Done by create_tmp_table() 2) Instantiating - Setting up internal structures, etc. - Done by instantiate_tmp_table() In this worklog, step 2 is skipped. Step 1, creating the temporary table is still done so that the fields of the table can be used to do typecasting. Since the table is never instantiated, no rows are ever written to it. Changes to syntax, results, errors and warnings ----------------------------------------------- There are no changes to syntax, results, errors or warnings as a result of this worklog. If some values must be typecast, the query should return the same type (and value) as when temporary tables are used. Optimizer trace and EXPLAIN output will change to show that temporary tables are not used for affected queries. This means that the UNION RESULT query block is no longer part of the EXPLAIN output, since this block is the query that reads data from the temporary table. Typecasting to a single output type for each column should happen behind the scenes and should not be visible in any output. Rules ----- 1) Skip instantiating a temporary table and send output of the union directly to the client if all these conditions apply: - The union is not DISTINCT - There is no global ORDER BY clause - The union is not the top level query block of an INSERT ... SELECT statement - The union is not the top level query block of a REPLACE ... SELECT statement The data is written to and read from a field of the temporary table to do type conversion if needed. 2) Otherwise, use a temporary table. Examples -------- The EXPLAIN output should show that there is no query reading the UNION result from a temporary table. CREATE TABLE t1 (a INT, b INT, c INT); CREATE TABLE t2 (a INT, b INT, c TINYINT); INSERT INTO t1 VALUES (1,1,1), (2,2,2), (3,3,3); INSERT INTO t2 VALUES (1,1,1), (2,2,2), (3,3,3); EXPLAIN SELECT a, b, c FROM t1 UNION ALL SELECT a, b, c FROM t2; id select_type table type possible_keys key key_len ref rows Extra 1 PRIMARY t1 ALL NULL NULL NULL NULL 3 NULL 2 UNION t2 ALL NULL NULL NULL NULL 3 NULL EXPLAIN EXTENDED output will not show that typecast items are used: EXPLAIN EXTENDED SELECT a, b, c FROM t1 UNION ALL SELECT a, b, c FROM t2; id select_type table type possible_keys key key_len ref rows filtered Extra 1 PRIMARY t1 ALL NULL NULL NULL NULL 3 100.00 NULL 2 UNION t2 ALL NULL NULL NULL NULL 3 100.00 NULL Warnings: Note 1003 /* select#1 */ select `test`.`t1`.`a` AS `a`,`test`.`t1`.`b` AS `b`,`test`.`t1`.`c` AS `c` from `test`.`t1` union all /* select#2 */ select `test`.`t2`.`a` AS `a`,`test`.`t2`.`b` AS `b`,`test`.`t2`.`c` AS `c` from `test`.`t2` EXPLAIN FORMAT=JSON output will show that no temporary tables are used, and will not show typecast items: EXPLAIN FORMAT=JSON SELECT a, b, c FROM t1 UNION ALL SELECT a, b, c FROM t2; EXPLAIN { "query_block": { "union_result": { "using_temporary_table": false, "query_specifications": [ { "dependent": false, "cacheable": true, "query_block": { "select_id": 1, "table": { "table_name": "t1", "access_type": "ALL", "rows": 3, "filtered": 100 } } }, { "dependent": false, "cacheable": true, "query_block": { "select_id": 2, "table": { "table_name": "t2", "access_type": "ALL", "rows": 3, "filtered": 100 } } } ] } } } Warnings: Note 1003 /* select#1 */ select `test`.`t1`.`a` AS `a`,`test`.`t1`.`b` AS `b`,`test`.`t1`.`c` AS `c` from `test`.`t1` union all /* select#2 */ select `test`.`t2`.`a` AS `a`,`test`.`t2`.`b` AS `b`,`test`.`t2`.`c` AS `c` from `test`.`t2` DROP TABLE t1, t2;
Overall design ============== A new subclass of select_union, select_union_direct, implements result handling for UNION ALL without temporary tables. A select_union_direct object is created for each UNION query. The object is a wrapper around another select_result (e.g., select_send or select_insert) and receives the output from each query of the union in sequence, casts fields to the correct data type and forwards the rows to the wrapped select_result. Detailed design =============== New classes ----------- select_union_direct (subclass of select_union) Result class for UNION ALL without temporary table. This is a wrapper around another select_result, allowing several st_select_lexes to deliver their output into the same result object. Most member function calls are forwarded to the wrapped select_result, but some can't be called more than once (e.g., prepare()). These are implemented so that the wrapped function is called only the first time. At construction time, the st_select_lexes that use the select_union_direct object is known. A pointer to the last st_select_lex of the union is kept to know when send_eof() should forward its call to the wrapped select_result. A temporary table is created to do the typecasting, but it is never instantiated. When a query returns a row, Item_fields for the temporary table are filled and read in order to do the type conversion, but no rows are written to the table, and the row is forwarded to the wrapped select_result immediately. New functions ------------- select_result::change_result() Change the wrapped select_result object of wrapper select_results. The introduction of this function leads to changes in JOIN::change_result(), since it now needs to make this call to its select_result. As a consequence, the function signatures for change_result() becomes the same for all classes that implement it. st_select_lex_unit::union_needs_tmp_table() Assuming that the unit represents a union, return true if the unit needs a temporary table to store the result of queries. The correct way to check if a st_select_lex_unit represents a union that needs a temporary table, is to check (is_union() && union_needs_tmp_table()). Changes to st_select_lex_unit ----------------------------- The choice of select_result defines how results are returned (to the client with select_send, to a table with select_insert, etc.). This select_result is taken as parameter sel_result to st_select_lex_unit::prepare(), which decides on the select_result for each st_select_lex. The choice is between select_union, select_union_direct or the value of sel_result. Despite the name, select_union is used not only for union queries, but also for other queries that need to collect their results in a temporary table. If the unit represents a union query that doesn't need temporary tables, a select_union_direct is created. There is no need for a fake_select_lex. The parser creates one, but st_select_lex::prepare() removes it when it finds out that a temporary table is not needed. In st_select_lex::prepare(), st_select_lex::exec() and st_select_lex::explain(), there are numerous steps for setting up and executing fake_select_lex, and these are skipped by testing if, after choosing a select_result, fake_select_lex != NULL. Changes to EXPLAIN code ----------------------- Traditional explain is more or less untouched since it just loops through the st_select_lexes and explains them one by one. Checks for fake_select_lex != NULL are added to support unions without fake_select_lex. JSON explain output wraps the details of the union query in an "object" representing the fake_select_lex. When explaining a union without a temporary table, since there is no fake_select_lex, a wrapping "object" in the JSON output is made by union_ctx::format_body().
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.