The long awaited first release candidate of MySQL 8.0 is now available. The theme of this release is “making MySQL better for modern apps”. What does that mean exactly?
- A modern application is mobile first. Mobile-first is not just a theme applied to an existing app, it is about using context about the user (such as their location) and reducing the clicks required for a transaction.
- Unicode (or more specifically UTF-8 encoding) has become universal even in English speaking markets. A key driver is mobile applications, where emojis are frequently used as character input. To support modern applications, it is important to have first-class support for UTF-8 out of the box.
- Modern applications are developed with a premium on time-to-market. You don’t always know how what you developed will be used, and want to be able to respond fast to the directions the market takes you. It is important to have flexibility, and a large part of this story is schemaless JSON support.
“8.0” also marks a departure from the 5.x numbering series, which we have used since 2005. We felt that using a fresh number was important to communicate the depth of some of the core changes. Skipping 6.x and 7.x was intentional to avoid confusion with a previous alpha release and MySQL Cluster.
Highlights
Improved JSON Support
JSON support was perhaps the most well received feature of MySQL 5.7, and with MySQL 8.0 we have expanded it with a number of useful additions. In terms of SQL Functions, we have added functions to check the on-disk size of the native JSON type, array and object aggregate functions, and a pretty function for formatting.
JSON path expressions now support ranges, which means that you can now extract the top n or last item from an array:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
mysql> CREATE TABLE t1 (doc JSON); Query OK, 0 rows affected (0.01 sec) mysql> INSERT INTO t1 VALUES ('[1, 2, 3, 4, 5]'); Query OK, 1 row affected (0.00 sec) mysql> SELECT doc->"$[1 to 3]" FROM t1; +------------------+ | doc->"$[1 to 3]" | +------------------+ | [2, 3, 4] | +------------------+ 1 row in set (0.00 sec) mysql> SELECT doc->"$[last-2]" FROM t1; +------------------+ | doc->"$[last-2]" | +------------------+ | 3 | +------------------+ 1 row in set (0.00 sec) |
We have also enhanced the performance of JSON, with optimizer support for a partial update, and improved performance of manipulating JSON in a number of cases such as sorting data.
Document Store
The MySQL document store allows you to treat MySQL like a document database, with a set of NoSQL CRUD APIs to access your data. First introduced in MySQL 5.7, the document store now provides you with consistent reads-and-writes. This gives it a distinct advantage over other NoSQL databases, in that developers don’t need to give up transactional semantics. Data created in the Document Store can also be accessed via the regular MySQL SQL protocol.
In addition to that MySQL document store has expanded its indexing capabilities. To enhance spatial searches, we have added support for spatial indexing of GeoJSON data within MySQL JSON documents, enabling efficient spatial searches of documents in the MySQL Doc Store. On the full text search side of things we now allow full text indexing of entire or part of a JSON document. With this, users can more flexibility search all or part of the data within your JSON documents within MySQL.
The MySQL Shell now lets users add a splash of color, style, and ease of use. New customizations lets users add various session or other contextual information within the shell – helping users track and view things like connection/protocol, active schema, SSL enabled, etc. MySQL Shell users can add custom fonts, browse command history, perform multiline editing, and benefit from full Unicode support.
CTEs and Window Functions
By popular demand, we have added both (recursive) CTEs and Window Functions. This new SQL syntax allows you to express complex queries in ways that were previously not possible, or difficult to write. For example querying hierarchies:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
|
WITH RECURSIVE cte AS ( # seed SELECT SELECT category_id, name FROM category WHERE parent IS NULL UNION ALL # recursive SELECT SELECT c.category_id, c.name FROM category c JOIN cte ON cte.category_id=c.parent # find children ) SELECT name FROM cte; +-------------+----------------------+ | category_id | name | +-------------+----------------------+ | 1 | ELECTRONICS | | 2 | TELEVISIONS | | 6 | PORTABLE ELECTRONICS | | 3 | TUBE | | 4 | LCD | | 5 | PLASMA | | 7 | MP3 PLAYERS | | 9 | CD PLAYERS | | 10 | 2 WAY RADIOS | | 8 | FLASH | +-------------+----------------------+ 10 rows in set (0,00 sec) |
If you are not familiar with Window Functions, the easiest way to describe them is like a GROUP BY, but not affecting the results. You can picture this as peeking through a window to look at the other rows, such as looking at the aggregate sales performance of an employee, while enumerating through a list of sales:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
SELECT employee, date, sale, SUM(sale) OVER (PARTITION BY employee) AS sum FROM sales; +----------+------------+------+------+ | employee | date | sale | sum | +----------+------------+------+------+ | odin | 2017-03-01 | 200 | 900 | | odin | 2017-04-01 | 300 | 900 | | odin | 2017-05-01 | 400 | 900 | | thor | 2017-03-01 | 400 | 1200 | | thor | 2017-04-01 | 300 | 1200 | | thor | 2017-05-01 | 500 | 1200 | +----------+------------+------+------+ |
Window Functions are extremely powerful, and reduce code complexity significantly. Two of my favourite examples are LAG() and LEAD(), which can be used to access the previous or next row in a set.
Better Handling of Hot Rows
We have added support for the lock modifiers SKIP LOCKED and NOWAIT. These two features allow you to better manage situations where you have tables with hot row contention – such as in the cases of a hot SKU in an commerce application or a set of worker threads that are all reading from the same table trying to find new rows to process.
When encountering locked rows, the default behaviour is to queue waiting for up to 50 seconds for the lock holder to release. You now have the option to error immediately (NOWAIT) or non-deterministically skip past locked rows (SKIP LOCKED). You can even use both at the same time:
1
2
3
4
5
6
|
SELECT seat_no FROM seats JOIN seat_rows USING ( row_no ) WHERE seat_no IN (3,4) AND seat_rows.row_no IN (12) AND booked = 'NO' FOR UPDATE OF seats SKIP LOCKED FOR SHARE OF seat_rows NOWAIT; |
Unicode 9.0
We’ve upgraded our Unicode support to the latest 9.0 standard, and switched from latin1 to utf8mb4 for the default character set. This means that you can now have bacon out of the box:
1
2
3
4
5
6
7
8
9
10
11
12
13
|
mysql> CREATE TABLE t1 (id int not null primary key auto_increment, b varchar(255)); Query OK, 0 rows affected (0.02 sec) mysql> INSERT INTO t1 values (null, '\U+1F953'); Query OK, 1 row affected (0.01 sec) mysql> select * from t1; +----+------+ | id | b | +----+------+ | 1 | 🥓 | +----+------+ 1 row in set (0.00 sec) |
On a more serious note, Unicode (or more specifically UTF-8 encoding) has become universal even in English speaking markets. A key driver is mobile applications, where emojis are frequently used as character input.
The challenge with variable length character encodings has always been performance, which the team has worked hard on improving during the development of 8.0. Current users of MySQL 5.7 should notice a substantial performance increase by upgrading to one of the new collations in MySQL 8.0, and we have also deprecated the earlier utf8mb3 character set.
Collation support has also improved with new collations based on DUCET, accent and case sensitive collations, and support for Japanese including kana sensitivity.
Improvements to Query Consistency
With SQL being a declarative language, a query is similar to what a street address is to GPS navigation. That is to say it conveys a final destination, and not a set of directions on how to get there. The optimizer is the part of the database system which “picks” the best execution plan (to extend the street analogy, is Google Maps). MySQL 8.0 contains several improvements designed to help picking the best plan more consistently.
Optimizer Histograms (not to be confused with Performance Schema histograms) have been introduced to help with cases where a high amount of data skew would have previously led to a poor execution plan choice.
The cost model has been refined to consider the effect of how much of the table or index is resident in memory. For example: On a given query, it may be preferred to range scan only when a large percentage of the index is in memory. The individual constants used in the cost model are user-configurable, and in MySQL 8.0 the usability has been improved slightly by showing the default values:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
mysql> SELECT * FROM mysql.server_cost; +------------------------------+------------+---------------------+---------+---------------+ | cost_name | cost_value | last_update | comment | default_value | +------------------------------+------------+---------------------+---------+---------------+ | disk_temptable_create_cost | NULL | 2017-08-22 09:41:53 | NULL | 20 | | disk_temptable_row_cost | NULL | 2017-08-22 09:41:53 | NULL | 0.5 | | key_compare_cost | NULL | 2017-08-22 09:41:53 | NULL | 0.05 | | memory_temptable_create_cost | NULL | 2017-08-22 09:41:53 | NULL | 1 | | memory_temptable_row_cost | NULL | 2017-08-22 09:41:53 | NULL | 0.1 | | row_evaluate_cost | NULL | 2017-08-22 09:41:53 | NULL | 0.1 | +------------------------------+------------+---------------------+---------+---------------+ 6 rows in set (0.00 sec) mysql> SELECT * FROM mysql.engine_cost; +-------------+-------------+------------------------+------------+---------------------+---------+---------------+ | engine_name | device_type | cost_name | cost_value | last_update | comment | default_value | +-------------+-------------+------------------------+------------+---------------------+---------+---------------+ | default | 0 | io_block_read_cost | NULL | 2017-08-22 09:41:53 | NULL | 1 | | default | 0 | memory_block_read_cost | NULL | 2017-08-22 09:41:53 | NULL | 0.25 | +-------------+-------------+------------------------+------------+---------------------+---------+---------------+ 2 rows in set (0.00 sec) |
Additional query hints have also been added for finer grain control of the optimizer if it makes incorrect choices. Database Administrators can also insert these hints by using the query rewrite feature on the MySQL server side.
GIS
Modern applications are mobile, and (besides using emoji!), one of the characteristics of mobile is that it frequently make use of a user’s location to provide context. In MySQL 5.7, we made the strategic choice to switch from our home-grown GIS functions to using Boost.Geometry as a core library. We have since become regular contributors to Boost, employing two developers to work on it full time.
In MySQL 8.0 GIS is extended to support Geography and Spatial Reference Systems (SRS). We have also implemented standards compliant axis ordering in functions, as well as a ST_SwapXY() helper function to convert incorrectly specified data.
Cloud Friendly
It is becoming more common for MySQL to be deployed in a virtual machine, where the amount of system resources can change as the system is scaled up or down. With the new innodb_dedicated_server option, it’s possible to auto-detect the system memory, and have MySQL adjust appropriately without having to edit configuration files.
It is also more common to have restricted or zero local access (shell or filesystem) to running database servers. This restriction makes it hard for administrators to change MySQL configuration. This was one of the use-cases for the new SET PERSIST feature, which allows you to persist global variables between MySQL restarts:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
mysql> SET PERSIST innodb_buffer_pool_size = 1024 * 1024 * 1024; Query OK, 0 rows affected (0.01 sec) mysql> SELECT * FROM performance_schema.variables_info WHERE variable_name = 'innodb_buffer_pool_size'\G *************************** 1. row *************************** VARIABLE_NAME: innodb_buffer_pool_size VARIABLE_SOURCE: DYNAMIC VARIABLE_PATH: MIN_VALUE: 5242880 MAX_VALUE: 9223372036854775807 SET_TIME: 2017-09-10 12:24:52 SET_USER: msandbox SET_HOST: localhost 1 row in set (0.00 sec) |
Native Data Dictionary
Just as you use a database like MySQL to store your application data, MySQL must also store its meta data (schema names, table definitions etc) somewhere. Traditionally this meta data storage has been split between many different locations (.FRM, .PAR, .OPT, .TRN and .TRG files).
While there are a number of reasons why this was an important core-change, the TL;DR is a lower barrier to entry for new features, and improved reliability as we unify meta data management in transactional storage.
Invisible Indexes
Many of today’s applications run 24×7, leaving no clear maintenance windows for software upgrades or database changes. One key piece of feedback we received from our customers is that any change to indexing can have unintended consequences.
Invisible indexes are a new feature in MySQL 8.0 that provide the ability to mark an index as unavailable for use by the Optimizer. This means that the index will still be maintained and kept up-to-date as data is modified, but no queries will be permitted to make use of the index (even if the query uses a FORCE INDEX hint).
One common use case is to find out which indexes are unused via the SYS schema. You can then set them to INVISIBLE, which is the database equivalent to a Recycle Bin:
1
2
3
4
5
6
7
8
9
10
11
|
mysql> SELECT * FROM sys.schema_unused_indexes; +---------------+-------------+------------+ | object_schema | object_name | index_name | +---------------+-------------+------------+ | world | Country | p | | world | Country | p_c | +---------------+-------------+------------+ 2 rows in set (0.01 sec) mysql> ALTER TABLE Country ALTER INDEX p INVISIBLE; Query OK, 0 rows affected (0.01 sec) |
After some time has elapsed, you can then safely DROP the index. The same process can also be used in reverse for a staged-rollout of indexing changes.
Improvements to Defaults
In total we have changed over 20 defaults from MySQL 5.7 to MySQL 8.0, as we pay close attention to improving the out of the box experience. Notable changes include several changes to replication, additional performance schema instruments enabled by default (memory, transactions, mdl), the default character set is now utf8mb4, and InnoDB now assumes SSDs by default.
The default configuration remains targeted for a system with 512M of memory, making it friendly for development environments and small cloud VMs. But in MySQL 8.0 we have made it easier to switch to a dedicated mode, where it will scale to use all system resources.
For a more detailed description of changes, we recommend users read the What Is New in MySQL 8.0 manual page when upgrading.
Refactoring and Modernization
In MySQL 8.0 we made the strategic decision to require C++11 support, and remove legacy support for the Query Cache, partitioning engine and libmysqld. We have also now standardized the server on the Google C++ coding standard, and started publishing internals documentation generated by Doxygen.
We are diligent in managing the technical debt of a now 20-year old code base, and with each major release target specific areas for improvement. These targets may be based on analysis of historical areas of complaints (as in the case of the native data dictionary) or intended future work, but we also take a more holistic look at how we can simplify our code base and make better use of modern libraries and C++ features.
As a aside: earlier work in refactoring the parser and merging seven grammar rules for SELECT into one made it much easier to add non-recursive CTEs in MySQL 8.0, and removing support for libmysqld reduced package downloads from ~1GB to ~350MB! We look forward to future wins from today’s investments.
And Many More…
In addition to these features, there is also support for the SQL GROUPING function, a new set of UUID helper functions, more flexible UNDO tablespace management, Descending indexes, improvements to the InnoDB memcached API, new bit operators,improvements to optimizer trace, Auto-increment values now persist through restarts, improvements to error logging and filtering, faster table and range scans, more instrumentation with performance schema and many more features that are too long to list!
We would also like to thank those who contributed patches to make MySQL 8.0 better: Davi Arnaut, Laurynas Biveinis, Daniel Black, Andrew Bloomgarden, Zhe Dong, Daniël van Eeden, Jiamin Huang, Christian Hesse, Yasufumi Kinoshita, Alexey Kopytov, Tomislav Plavcic, Stewart Smith, Yura Sorokin, Tsubasa Tanaka, Zhai Weixiang.
Source code is now available on GitHub, with binaries available to download from dev.mysql.com, Please try out MySQL 8.0 RC1, and let us know what you think!