In some cases, the server creates internal temporary tables while processing statements. These tables could be stored in memory or on disk – the first option is preferred but there exist some limitations. One of such restrictions was presence of TEXT or BLOB columns in the table; as in-memory storage engines (MEMORY and TempTable) did not supported these types server had to use the on-disk engine (InnoDB or MyISAM). But this is no longer true.
The Change
The TempTable engine was extended with the support for columns with BLOB-like storage – that includes BLOB, TEXT, JSON or geometry types. The change reuses solutions that were created before to provide efficient storage for VARCHAR and VARBINARY columns:
- When there are any columns with BLOB-like storage then the same row format as for tables with VARCHAR/VARBINARY data is used: each cell containing a NULL flag, the data length, and a data pointer.
- Column values are placed in consecutive order after the array, in a single region of memory, without padding.
- Rules of use of temporary files as an overflow mechanism remain unchanged.
After this change is introduced server would store internal temporary tables with BLOBs using TempTable engine (if it configured as the one to be used for in-memory temporary tables; it is the default setting). It is still possible to force storing internal temporary tables on disk with the big_tables system variable.
Benchmarks
The main goal of this change was to get performance improvement when server is using internal temporary tables with BLOB columns. At the same time the impact on tables not using such types (supported already) should be none or minimal.
No impact on tables without BLOBs
Sysbench tool was used to measure the impact of change for tables without BLOB-like storage. All results for such tables were within +/-1% difference between original and modified version showing that there is none or minimal impact.
Performance gain on tables with JSON colums
The sysbench was also used to measure performance gain for tables with JSON columns. The example results below are from tests that were running on machine with two E5-2699 v4 @ 2.20ghz CPUs, the data size used was 127GB, range size set to 500.
Benchmark | Gain | ||
16 threads | 32 threads | 64 threads | |
JSON_SQL_RW | 33.72% | 47.63% | 71.21% |
JSON_SQL_RO | 89.51% | 127.69% | 215.08% |
JSON_SQL_DISTINCT_RANGE | 388.31% | 530.95% | 791.24% |
Similar benchmarks (with smaller data size) were also done on less powerful machine running Windows:
Benchmark | Gain | ||
16 threads | 32 threads | 64 threads | |
JSON_SQL_RW | 31.55% | 41.30% | 48.88% |
JSON_SQL_RO | 42.85% | 41.09% | 45.33% |
JSON_SQL_DISTINCT_RANGE | 107.27% | 96.39% | 95.67% |
Custom test
Additional custom test was also created to check the performance and behavior on different data sets. Three tables were prepared:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
CREATE TABLE t1 (id INT NOT NULL, value TEXT NOT NULL); CREATE TABLE t2 (id INT NOT NULL, value TEXT NOT NULL); CREATE TABLE t3 (value TEXT NOT NULL); # Fill t1 with 20 rows, each repeated 10000 times: INSERT INTO t1 VALUES (repeat_index % 10, REPEAT('0', 10000)); ... INSERT INTO t1 VALUES (repeat_index % 10, REPEAT('J', 10000)); # Fill t2 with 10 x 10.000 rows, each with unique value): INSERT INTO t2 VALUES (repeat_index, REPEAT('0', 10000 + repeat_index)); ... INSERT INTO t2 VALUES (repeat_index, REPEAT('9', 10000 + repeat_index)); # Fill t3 with 5 rows, each repeated 10 times: INSERT INTO t3 VALUES (REPEAT('0', 10000)); ... INSERT INTO t3 VALUES (REPEAT('4', 10000)); |
Then following queries were executed:
1 2 3 4 5 6 7 |
SELECT COUNT(DISTINCT value) FROM t1; SELECT id,SUBSTR(value,1,1000) FROM t1 GROUP BY id,SUBSTR(value,1,1000) ORDER BY id; SELECT id,SUBSTR(value,1,2000) FROM t1 GROUP BY id,SUBSTR(value,1,2000) ORDER BY id; SELECT COUNT(DISTINCT value) FROM t2; SELECT id,SUBSTR(value,1,1000) FROM t2 GROUP BY id,SUBSTR(value,1,1000) ORDER BY id; SELECT id,SUBSTR(value,1,2000) FROM t2 GROUP BY id,SUBSTR(value,1,2000) ORDER BY id; SELECT COUNT(DISTINCT v1.value, v2.value, v3.value) FROM t3 AS v1, t3 AS v2, t3 AS v3; |
The results on i5-4690K machine, in seconds:
Query | Temptable | InnoDB | Gain |
DISTINCT, repeating values | 16.75 | 17.10 | 2.02% |
GROUP BY, repeating values | 3.04 | 3.30 | 7.68% |
GROUP BY, repeating values | 4.67 | 4.94 | 5.50% |
DISTINCT, unique values | 3.40 | 8.43 | 59.66% |
GROUP BY, unique values, substring repeating | 1.58 | 2.57 | 38.62% |
GROUP BY, unique values, substring repeating | 2.16 | 3.70 | 41.78% |
JOIN | 35.46 | 37.98 | 6.64% |
Summary
The benchmark results presented above show that users should get a performance boost when executing queries for which server needs to create temporary tables.
As always, a big thank you for using MySQL and let us know your feedback!