MySQL :: The MySQL 8.0.0 Milestone Release is available

The MySQL Development team is happy to announce our 8.0.0 development milestone release (DMR), now available for download at dev.mysql.com. The source code is available at GitHub. You can find the full list of changes and bug fixes in the 8.0.0 Release Notes. Here are the highlights. Enjoy!

Transactional Data Dictionary

MySQL 8.0 will have a real Data Dictionary implemented as a set of SQL tables stored in InnoDB and we deliver most of it in this 8.0.0 DMR. The FRM, TRG, PAR files are gone. Information Schema has been re-implemented as views on data dictionary tables. DDL statements become atomic and crash safe and reliability is increased. MyISAM is now in principle optional as all system tables have been moved from MyISAM to InnoDB. This blog post will explain the new data-dictionary in some depth, but first we present the other features and changes delivered in this 8.0.0 DMR.

SQL roles

Roles (WL#988) — This work by Kristofer Petterson implements standard SQL Roles in MySQL 8.0. A role is a named collection of privileges. One can grant roles, grant to roles, create roles, drop roles, and decide what roles are applicable during a session. We have also provided an SQL function ROLES_GRAPHML() returning a graphml document representing role subgraphs. In the future, we expect to utilize “system roles” when breaking up the super privilege into finer grained components.

Character Sets

utf8mb4 collations (WL#9108 and WL#9125) — This work by Xing Z. Zhang is part of a bigger effort to improve our UTF8 support. The overall goal is to change the default character set from latin1 to utf8mb4, and the default collation from latin1_swedish_ci to utf8mb4_800_ci_ai. UTF8 is the dominating character encoding for the web, and this move will make life easier for the vast majority of MySQL users. In 8.0.0, we have added 21 language specific case insensitive utf8mb4 collations (see WL#9108) in addition to utf8mb4_800_ci_ai (see WL#9125). If time permits, we will also add accent and case sensitive collation. The default character set has not been changed in 8.0.0, but we plan to do so before GA.

Upgrade Unicode data to 9.0.0 (WL#9479) — This work by Xing Z. Zhang upgrades our collation weight and case mapping data and collations to build our new collations on latest Unicode data. Unicode committee announced Unicode 9.0.0 on Jun 21, 2016.

Indexes

Support for INVISIBLE indexes (WL#8697) — This work by Martin Hansson adds the capability of toggling the visibility of an index (visible/invisible). An invisible index is not considered by the optimizer when it makes the query execution plan. However, the index is still maintained in the background so it is cheap to make it visible again. The purpose of this is for a DBA / DevOp to determine if an index can be dropped. If you suspect an index of not being used you first make it invisible, then monitor query performance, and finally remove the index if no query slow down is experienced. This feature has been asked for by many users, for example Dropbox. See also the Feature Request logged as Bug#70299.

Bit-wise operations on binary data types, IPV6 and UUID manipulation

Bit-wise operations on binary data types (WL#8699) — This work by Catalin Besleaga extends our existing bit-wise operations (‘bitwise AND’, etc), which work with BIGINT, to also work with [VAR]BINARY/[TINY|MEDIUM|LONG]BLOB. Prior to 8.0 bit-wise operations were only supported for integers (and not binaries). If you used bit-wise operations on binaries the arguments were implicitly cast to BIGINT (64 bit) before the operation, thus possibly losing bits. From 8.0.0 and onward bit-wise operations work for all BINARY and BLOB data types, casting arguments such that bits are not lost.

We have improved usability of IPv6 manipulation: In MySQL 5.6 we introduced the INET6_ATON() and INET6_NTOA() functions which convert IPv6 addresses between text form (like 'fe80::226:b9ff:fe77:eb17') and VARBINARY(16). However, until now we could not combine these IPv6 functions with bit-wise operations since such operations would – wrongly – convert output to BIGINT. With WL#8699 this has been fixed. For example, if we have an IPv6 address and want to test it against a network mask, we can now use INET6_ATON(address) & INET6_ATON(network) because INET6_ATON() correctly returns the VARBINARY(16) datatype (128 bits).

We have improved usability of UUID manipulations (WL#8920) — This work by Catalin Besleaga improves the usability of UUID manipulations by implementing three new SQL functions: UUID_TO_BIN(), BIN_TO_UUID(), and IS_UUID(). The first one converts from UUID formatted text to VARBINARY(16), the second one from VARBINARY(16) to UUID formatted text, and the last one checks the validity of an UUID formatted text. The UUID stored as a VARBINARY(16) can be indexed using functional indexes. The functions UUID_TO_BIN() and UUID_TO_BIN() can also shuffle the time-related bits and move them at the beginning making it index friendly and avoiding the random inserts in the B-tree, this way reducing the insert time. The lack of such functionality has been mentioned as one of the drawbacks of using UUID’s.

To summarize, MySQL does not have special data types for IPv6 addresses or UUIDs but instead encourages the use of VARBINARY(16). MySQL provides functions to convert from textual IPv6/UUID representations to and from the more compact VARBINARY(16) datatype. MySQL now offers bit-wise operations on VARBINARY(16) datatype. IPv6/UUID functions combined with bit-wise operations can be used to test, extract, or combine on parts (sub-structure) of the IPv6/UUID, i.e. defining a function over IPv6/UUID content. This function can be used to define the content in a virtual generated column which then can be indexed.

SET PERSIST for global variables

SET PERSIST for global variable values (WL#8688) — This work by Satish Bharathy makes it possible to persist global, dynamic server variables. Currently a number of server variables are both GLOBAL and DYNAMIC and can be reconfigured while the server is running. For example: SET GLOBAL sql_mode='STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION'; However, such settings are lost upon a server restart. This work makes it possible to write SET PERSIST sql_mode='STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION'; The effect is that the setting will survive a server restart. There are many usage scenarios for this functionality but most importantly it gives a way to manage server settings when editing the configuration files is inconvenient or not an option. For example in some hosted environments you don’t have file system access, all that you have is the ability to connect to one or more servers. As for SET GLOBAL you need the super privilege for SET PERSIST.

This feature also introduces a new table variables_info in the performance_schema database. The variables_info table stores the variable name, where the current value came from, and min/max values.

Performance Schema

Performance Schema instrumenting errors (WL#8058) — This work by Mayank Prasad makes it possible to look at aggregated counts of error messages reported by the server. The user can look at statistics from 5 different tables: Global count, summary per thread, summary per user, summary per host, or summary per account. For each error message the user can see the number of errors raised, the number of errors handled by the SQL exception handler, “first seen” timestamp, and “last seen” timestamp. Given the right privileges the user can either SELECT from these tables or TRUNCATE to reset statistics.

Performance Schema Indexes (WL#6616) — This work by Chris Powers speeds up performance schema queries by adding more than 100 indexes on performance schema tables. The indexes on performance schema tables are predefined. They cannot be deleted,added or altered. A performance schema index is implemented as a filtered scan across the existing table data, rather than a traversal through a separate data structure. There are no B-trees or hash tables to be constructed, updated or otherwise managed. Performance Schema tables indexes behave like hash indexes in that a) they quickly retrieve the desired rows, and b) do not provide row ordering, leaving the server to sort the result set if necessary. However, depending on the query, indexes obviate the need for a full table scan and will return a considerably smaller result set. Performance schema indexes are visible with SHOW INDEXES and are represented in the EXPLAIN output for queries that reference indexed columns. See also comment from Simon Mudd @ Booking.com.

Performance Schema, Finer grained interfaces (WL#8880) — This work by Marc Alff breaks the existing interface (psi.h) in smaller parts (per instrumentation). This is an internal refactoring task, no functional change.

Parser Refactoring

We are continuing our effort of refactoring the SQL parser in an incremental way. The old parser had critical limitations because of its grammar complexity and top-down parsing style which lead to poor maintainability and extensibility. In MySQL 5.7 we did all of the DML statements and we are now turning our attention towards the DDL statements.

In MySQL 8.0.0 we have been refactoring the CREATE TABLE statement by Gleb Shchepa finishing umbrella worklog WL#8067 and its children WL#8433 / WL#8434 / WL#8435. The current main parser grammar is a mix of regular SQL statements and special Data Dictionary-related declarations such as parse_gcol_expr and partition_entry which are not valid SQL statements but data definitions from data dictionary. This work is about separating such grammar rules from regular SQL with the help of lexical scanner. WL#8433 is about separating out DD commands from regular SQL queries in the parser grammar, WL#8434 is about refactoring partitioning-related definitions, and WL#8435 is about refactoring columns-related definitions.

Allow parsing a single expression (WL#7840) — This work by Gleb Shchepa makes it possible to parse expressions independently of their context, e.g. the PARTITION BY clause. This work was done as support to the data dictionary project who wanted to store expressions as text strings.

Merge all SELECT rules into one (WL#8907) — This work by Gleb Shchepa merges the 7 separate grammar rules for the SELECT in different contexts into one. This makes the SELECT syntax uniform in all contexts. A benefit will be that adding a WITH clause to various forms of SELECT will be doable with one single code change in sql_yacc.yy.

Introduce <query expression> parser rule (WL#8083) — This work by Martin Hansson introduces the SQL standard’s syntactic rule <query expression> and its sub-rules for production of query syntax. These replace our bison rules for representing select/union syntax. The syntax duplication for queries and subqueries has been removed, both now use <query expression>. The <table reference> syntax rule is now handled properly and the syntax NATURAL INNER JOIN is now allowed.

Optimizer Hints

In MySQL 5.6 and earlier, derived tables were always materialized. In 5.7, derived tables are merged into the outer query in most cases, and materialized in some cases. Merging would give better performance if the outer query’s WHERE contains a selective predicate which can be pushed down to the view’s inner tables. On the other hand, derived tables containing dependent subqueries in the SELECT * FROM t1 LEFT JOIN (SELECT (subq) FROM t2) AS dt ON ...;. Such tables will always be read as many times as there are qualifying rows in the outer table, and the select list subqueries are evaluated for each row combination, so it is better to materialize the derived table.

Enabling merging a derived table or view through a optimizer hint (WL#9307) — This work by Guilhem Bichot allows users to control whether a derived table or view will be merged or materialized using the “merge” and “no_merge” hints. We can simply add a hint to tell the Optimizer to merge instead of materialize the derived table in the example above: SELECT /*+ merge */ * FROM t1 LEFT JOIN (SELECT (subq) FROM t2) AS dt ON ...;.

Cost Model

InnoDB buffer estimates for tables and indexes (WL#7170) — This work by Vasil Dimov provides estimates for how much of a table or an index is currently available in a main memory buffer. When the Optimizer chooses which access method to use, it can benefit if it knows whether data is stored in memory or has to be read from disk. This work implements the necessary support in InnoDB for providing these estimates to the handler. This work is part of a bigger effort to let the optimizer know whether data is in main memory or on disk and adjust its behavior accordingly.

Histograms

With Histograms, the User or DBA can create statistics on data distribution, typically for non-indexed columns, which then will be used by the query optimizer in finding the optimal query plan. As the first steps towards histogram support in 8.0 we have implemented the following two worklogs. The functionality that actually creates and uses histograms will be implemented in separate worklogs in upcoming DMRs.

Classes/structures for Histograms (WL#8707) — This work by Erik Froseth implements classes and functions for histogram statistics. We implemented the necessary functions for building both singleton and equi-height histograms, as well as serializing the histograms to JSON format. This worklog will not have any visible effect to MySQL users, as it won’t expose any new SQL functions or alter any existing behavior.

Persistent storage of Histogram data (WL#8706) — This work by Erik Froseth defines the persistent storage format for histogram data. This is needed so that histogram data does not have to be created each time the server starts. We store the histogram statistics in a new system table: mysql.column_stats. The histogram itself will be stored in a JSON column due to the flexibility of the JSON datatype.

GIS

Spatial Reference Systems (WL#8579) — This work by Norvald Ryeng adds support for spatial reference system (SRS) definitions, both ellipsoids with latitude and longitude coordinates and flat map projections. This work is the first stepping stone on our road to full geography support. Until now, MySQL has only understood one coordinate system, a unitless 2d Cartesian plane that is not referenced to positions on Earth (SRID 0). This worklog adds support for georeferenced ellipsoids and 2d projections. This includes data dictionary storage and caching of SRS definitions, a parser for the WKT description of SRSs, and a set of default SRSs from the EPSG Geodetic Parameter Dataset. This work does not change the actual calculations. However, it provides a way to add SRS metadata to geometric objects. As of now, computations in geographic SRSs will still be done in SRID 0, but a warning will be issued.

Mutator ST_SRID (WL#8543) — This work by Hans Melby makes it possible to update the SRID of an existing geometry. The geometry’s coordinates are not transformed, but the SRID tells the server which spatial reference system the geometry is in. This is useful when geometry data is not in the desired spatial reference system, e.g., when upgrading from older MySQL versions that predate spatial reference systems support.

Mutator ST_X and ST_Y (WL#8606) — This work by Jens Even B. Blomsøy makes it possible to update the X and Y coordinates of existing point geometries. It also makes the ST_X and ST_Y functions consistently raise an error if the geometry is not a point. This improves both OpenGIS and SQL/MM compliance.

Improved Performance for Scans

Optimizer provides InnoDB with a bigger buffer (WL#7093) — This work by Knut Hatlen improves MySQL/InnoDB range scan performance by giving InnoDB a buffer and a batch size to read. Even before this work, in order to reduce the cost of latching and B-tree navigation, InnoDB uses a small internal buffer to fetch records in batches for scans. However, the server layer knows how much data it needs and should decide the batch size. This is what this work delivers. In SELECT queries, a buffer will be provided for each table and indexscan that is believed to read more than one row. The bigger buffer will primarily improve the performance of queries that perform scans which read a large number of rows, for example SELECT * FROM t; or SELECT * FROM t WHERE pk BETWEEN 1000 AND 10000; In these cases we have measured between 5-20% performance improvement.

Plugin Infrastructure

Service registry and component infrastructure (WL#4102) — This work by Marcin Babij implements a new Plugin infrastructure. The old infrastructure is still operational. We will gradually shift from the old to the new infrastructure. In the new infrastructure each component will provide a set of services that other components can consume. To facilitate this there is a registry of all services available to all components. Each component will communicate with other components only through services and will explicitly state the services it provides and consumes. The infrastructure will enable components to override and complement functionality provided by other components through re-implementing the relevant service APIs

BLOB

InnoDB: Refactor BLOB code to facilitate partial fetch/update (WL#9141, WL#8985) — This work by Annamalai Gurusami is part of a bigger effort to implement partial fetch and update of BLOBs. So far we have refactored uncompressed BLOB code as described in WL#9141 and compressed BLOB code as described in WL#8985. We want to have an efficient BLOB implementation since more and more people use BLOBs and they use bigger and bigger BLOBs. A driving factor here is that JSON documents are stored as BLOBs. A main goal for 8.0 is to replicate partial BLOB updates, to make the update of a single JSON attribute replicate efficiently.

InnoDB Memcached

InnoDB_Memcached: support multiple get and range search (WL#6650) — This work by Jimmy Yang enhances the InnoDB Memcached functionalities with multiple get operations and support for range queries. We added support for the multiple get operation to further improve the read performance, i.e. the user can fetch multiple key value pairs in a single memcached query. Support for range queries has been requested by Yoshinori @ Facebook. With range queries, the user can specify a particular range, and fetch all the qualified values in this range. Both features can significantly reduce the number of roundtrips between the client and the server.

Persisting Fast Changing Meta-data

InnoDB persistent max value for autoinc columns (WL#6204) — This work by Bin Su persists the AUTOINC counters by writing them to the redo log. This is a fix for the very old Bug#199. The MySQL recovery process will the replay redo log and ensure correct values of the AUTOINC counters. There won’t be any rollback of AUTOINC counters. This means that database recovery will reestablish the last known counter value after a crash. It comes with the guarantee that the AUTOINC counter cannot get the same value twice. The counter is monotonically increasing, but note that there can be gaps (unused values). The lack of persistent AUTOINC has been seen as troublesome in the past, e.g. see Bug#21641 reported by Stephen Dewey in 2006 or this blog post .

InnoDB: Persist the “corrupted” flag in the data dictionary (WL#7816) — This work by Bin Su makes the “corrupted” flag persistent by writing it to the redo log. For example, when InnoDB notices corruption in an index tree, it flags it as corrupted, both in its internal data dictionary cache and in persistent storage. This means that the server will get to know about corrupted indexes during recovery and thus avoid using such indexes for queries.

Temporary Tables

InnoDB: Map compressed temporary tables to uncompressed (WL#7899) — This work by Bin Su removes the InnoDB support for compressed temporary tables. Temporary tables are short-lived, ideally never written back to files, unless the buffer pool space is tight. It does not seem to make sense to try to reduce the file I/O by increasing the memory and CPU usage. All temporary tables will be created in the shared temporary tablespace “ibtmp1” introduced in WL#6560. Two columns will be dropped from the information schema table because they will be always FALSE: INNODB_TEMP_TABLE_INFO.PER_TABLE_TABLESPACE and INNODB_TEMP_TABLE_INFO.IS_COMPRESSED.

Store temporary table metadata in memory (WL#7784) — This work by Dmitry Lenev ensures that metadata for temporary tables are stored in memory only (not persisted). Until now, temporary table metadata was stored in FRM files. The historical reasons for this became obsolete in 5.7 because InnoDB stores temporary tables in a dedicated tablespace, which is discarded on startup. Note that the MySQL temporary table implementation differs from the SQL Standard in the sense that MySQL temporary tables are not shown in the information schema.

Performance & Scalability

InnoDB: Group purging of rows by table ID (WL#9387) — This work by Sunny Bains groups the undo rows by table ID and then parcels the work to the purge threads based on the table id. The purpose is to reduce dict_index_t::lock contention. Since the introduction of multi-threaded purge the purge threads work on random undo log records. If different purge threads all purge from the same table this leads to higher contention on the dict_index_t::lock (RW-LOCK). See also Bug#81368 reported by Domas Mituzas @ Facebook.

InnoDB: Remove the buffer pool mutex (WL#8423) — This work by Shaohua Wang splits the buffer pool mutex into several mutexes (free_list, LRU_list, zip_free, and zip_hash). This avoids or reduces scalability problems for some workloads where the old buffer_pool_mutex became hot. Thank you to Percona who contributed the original patch by Bug#75534 !

Split LOCK_thd_list and LOCK_thd_remove mutexes (WL#9250) — This work by Jon Olav Hauglid removes a current bottleneck for connect/disconnect performance by splitting up the two mutexes LOCK_thd_list and LOCK_thd_remove. Both mutexes protect the global list of current connections (THDs). Performance testing of shows around 5% improvement in TPS for point selects with reconnect between each query.

Compilers and Libraries

Enable use of C++11 in MySQL server (WL#8896) — This work by Jon Olav Hauglid enables use of C++11 features in 8.0. C++11 has a number of new features, especially related to concurrency, that are of interest. Compiler support includes GCC support for C++11, Clang support for C++11 , Visual Studio support for C++11, Solaris Studio support for C++11.

InnoDB: Move from homebrew thread management to std::thread (WL#9359) — This work by Sunny Bains introduces the use of std::thread library which will replace our own homebrew thread libraries. This is enabled by moving to C++11 in 8.0. The C++11 standard supports threading.

Doxygen Documentation

Starting with MySQL release 8.0.0, the source code is now commented using the Doxygen documentation generation tool. Several community members have pointed out that our internals documentation has been outdated (like here). With this initiative we hope to keep the internal docs up-to-date by aligning it with the source code and utilizing the Doxygen formatting. By documenting our source code better we make it easier to understand and maintain, and lower the barrier of entry for code contributions. This is our first version, we will build it out over time. Check it out here: http://dev.mysql.com/doc/dev/mysql-server/8.0.0/.

Replication

Restore backup on existing GTID-aware server – make GTID_PURGED settable always (WL#6591) — This work by Andrei Elkin makes it possible to restore backups on an existing server without losing existing Global Transaction ID (GTID) information and binary logs. Until now, it has only been possible to add GTIDs to GTID_PURGED only when GTID_EXECUTED is empty, i.e. when restoring a backup on an empty server. This work makes it possible to add GTIDs to GTID_PURGED at any time, even when GTID_EXECUTED is nonempty.

Enhanced Applier Thread Progress Details (WL#7364) — This work by Luis Soares adds monitoring information in performance schema for how far the SQL thread has processed each ROW event. For this purpose Row Based Replication (RBR) stages have been added to performance_schema stages tables. The motivation is that the stats that are shown through SHOW PROCESSLIST, INFORMATION_SCHEMA.PROCESSLIST, PERFORMANCE_SCHEMA.THREADS and PERFORMANCE_SCHEMA.EVENTS_STAGES_* have been misleading when the SQL thread applies row events. In fact, currently wrong information can be reported through SHOW PROCESSLIST. Therefore, users should at least be able to see through performance schema tables what is the correct state of the SQL thread and what exactly it is doing at the moment. This minimal information is of great help when troubleshooting RBR problems. We hope to extend this in the future to cover more stats for RBR. See also Bug#69360 by Simon Mudd, Bug#62019 by Mark Callaghan and Bug#53375 by Shane Bester.

Transactional Data Dictionary – Deep Dive

MySQL 8.0 comes with a transactional data dictionary. The goals of this project have been published earlier, e.g. in Ståle Deraas’ blog post. Here our focus will be what it is from a technical perspective.

MySQL 5.7 does not have an explicitly defined data dictionary, an approximation can be to point to persistent .FRM files (file system) and the table definition cache as the in-memory representation of tables. This changes with MySQL 8.0. MySQL 8.0 has an explicit definition of what the data dictionary is and is not, namely an identified set of meta-data tables stored in transactional storage (InnoDB). WL#6379 explains the data dictionary table structure. Moreover, each MySQL release will have a data dictionary version number which will be incremented whenever there is a change in the data-dictionary definition. The in-memory representation is in the form of objects kept in a dictionary cache. In 8.0.0 DMR, the data dictionary tables are stored in file-per-table tablespaces. Before 8.0 GA we will store dictionary tables in a single dedicated dictionary tablespace.

With the introduction of an explicit data dictionary it becomes more important to note the difference between *system tables* and the data-dictionary. Historically, MySQL has stored some of its meta-data as normal MyISAM tables in the *mysql database* (schema), in contrast to meta-data information stored in special files (.FRM, .PAR, .TRG, .TRN). Note that the concept of system tables still exists in MySQL 8.0, but these tables are now stored in InnoDB. Some of the MySQL 5.7 system tables have now been folded into the data dictionary (event and proc) while the rest are kept as system tables. The distinction between the data dictionary and the set of system tables is somewhat pragmatic, but one can say that data dictionary contains meta data needed to execute SQL queries while system tables contain auxiliary meta-data like timezone and help information. In the future it might be natural to fold additional system tables into the data dictionary in order to have a common infrastructure for all meta-data.

The distinction between the data dictionary and system tables has implications for *upgrade*. The data dictionary has an on-disk version number and the server will be able to detect the difference between its binary version and on-disk dictionary version and then execute built in upgrade code. So, the server can have built-in upgrade for the data dictionary, but this is yet to be implemented. However, upgrade for system tables works as in 5.7 by running the mysql_upgrade script in order to add or modify system tables.

Conceptually, the Information Schema is a view on the meta-data, mostly on meta-data stored in the data dictionary. In 5.7 and earlier, the information schema queries were implemented in a highly specialized way, not using the Optimizer. With 8.0 this changes to standard SQL queries on normal SQL tables fully utilizing the Optimizer and the SQL execution machinery. As a result, information schema queries see an average speedup of about 100X. Note that we have decided to hide the dictionary tables from the end user, they are only accessible as information schema views.

The above is roughly what has been delivered in the 8.0.0 DMR, the first step in the overall data dictionary project (runtime step). Here comes a preview of the second and last step that will complete the project (innodb step). There are three main parts:

First, we will eliminate the InnoDB internal data dictionary and use one common data dictionary and ensure crash safeness on all DDL statements. Meta-data for Foreign Keys are moved up from the SE layer to the common data dictionary paving the way for proper meta-data locking for Foreign Keys. This part will significantly increase MySQL reliability.

Second, we will write meta-data in a serialized form into user tablespaces on disk, providing meta-data redundancy and facilitating movement of data. We will also provide tools to check, extract and insert tablespace meta-data. This part will match the benefits associated with the .FRM, the ability to edit meta-data for some disaster recovery scenarios.

Third, we will implement import/export of self contained tablespaces (data+meta-data). This part will match the benefits associated with moving MyISAM tables around.

With this overview, here are the worklogs we delivered in 8.0.0.

Start & Upgrade

Bootstrap code for new data dictionary (WL#6394) — This work by Sivert Sørumgård implements server bootstrap in the context of the new data dictionary in 8.0. Bootstrapping is done when starting the mysqld server process. There are four basic aspects of this: 1) Creating the data dictionary while starting a new server instance. 2) (Re)starting a server with existing data. 3) Upgrading from a previous 8.0 version. 4) Upgrading from 5.7 to 8.0, i.e., start using the new data dictionary on already existing user data. The general direction is for the server to be able to bootstrap itself and eliminate external scripts from the process.

Upgrade to Transactional Data Dictionary (WL#6392) — This work by Abhishek Rajan implements in-place upgrade from 5.7 to 8.0. The upgrade works as before, like from 5.6 and 5.7. The user starts the 8.0.0 server on a 5.7 image and then runs mysql_upgrade.

Sidenote: Logical upgrade works as well but the user should consult the user documentation for correct options to mysqldump. This will fix any changes in system tables between mysql-5.7 and mysql-8.0. This will also fix performance schema, sys schema, and information schema databases.

Information Schema

New Data Dictionary and I_S integration (WL#6599) — This work by Gopal Shankar re-implements information schema so most data is read from data dictionary tables and thus avoids creation of temporary tables and preparation of TABLE_SHARE object upon every query execution that contains information schema tables. In contrast, the information schema tables in 5.7 are temporary tables that are created and filled during execution of the query.

InnoDB: provide a way to do non-locking reads (WL#7464) — This work by Satya Bodapati avoids information schema queries that get blocked by a parallel DDL operation. The new data dictionary allows information schema queries to get their data directly from data dictionary tables. Information Schema tables will be represented as views, thus a query from an information schema table will be a query from a view. Information schema queries can be executed under different isolation levels that are set by the user. Hence it is possible that information schema queries would be blocked by a parallel DDL operation. This would be a compatibility problem wrt. 5.7. In order to avoid this problem we implemented a way to request non-locking read on a per handler instance.

Use non-locking reads for data dictionary tables under I_S view (WL#7907) — This work by Gopal Shankar implements non-locking reads on data dictionary tables when they are used under a “system view”, which matches the behavior in 5.7. This is the server layer counterpart to WL#7464 explained above.

Change DDL to update rows for view columns in DD.COLUMNS and other dependent values (WL#7167) — This work by Praveenkumar ensures that a view’s column information is stored into the data dictionary table mysql.COLUMNS. Moreover, the implementation of DDL statements affecting underlying tables of a view is extended to update view columns information in mysql.COLUMNS and mysql.TABLE.IS_UPDATABLE. This speeds up information schema queries because column information can be found by direct lookup.

Protect Data Dictionary tables (WL#6391) — This work by Sivert Sørumgård restricts the availability of data dictionary information to the users. User access to meta data is supported by the Information Schema. We do this to maintain predictable data dictionary table definitions, maintain meta data correctness, and separate interface from implementation. Access to executing DDL and DML statements is still supported for statements originating from the server itself, e.g. during initial start and upgrade.

System Tables

We have finalized the work we started in 5.7 moving system tables from MyISAM to transactional storage (InnoDB). The new implementation assumes transactional semantics but can in principle be stored in any transactional storage engine. The 5.7 system tables event and proc have been folded into the data dictionary. Note: We have also added a few new system tables, but this is not related to the new data dictionary. New system tables are: role_edges and default_roles (WL#988), components (WL#4102), st_spatial_reference_systems (WL#8579) and column_stats (WL#8706).

In 5.7 we started the work to move system tables into transactional storage, we moved replication related system tables, time zone tables (WL#7159), plugins tables and servers tables (WL#7160). Here in the 8.0.0 DMR we have finalized this work by moving the privilege system tables (WL#7158 by Dmitry Shulga) and User-Defined Function (UDF) system tables (WL#8980 by Thayumanavar Sachithanantha). The main work here has been to change the surrounding code to correctly deal with transactional semantics when interacting with transactional storage.

Make user management DDLs atomic (WL#9045) — This work by Harin Vadodaria makes sure that DDLs related to user management behave in atomic fashion. CREATE/ALTER/DROP USER and GRANT/REVOKE are used to manage users’ credentials and privileges. These statements allow modification for multiple users. However, until now modifications performed have not been atomic. It is possible that a user management DDL can successfully process some of the users in the list but produces error for others. This can create problems in a replication setup as well.

Seralized Dictionary Information (SDI)

Provide data dictionary information in serialized form (WL#7069) — This work by Dyre Tjeldvoll implements replacement functionality for the old .FRM files in the context of the new data dictionary. The old .FRM files used to store the meta data in a serialized form, i.e. the CREATE TABLE statement in text format. The new serialized form is a JSON formatted text. The new serialized format will be used by NDB as a replacement for “.FRM shipping”, and by InnoDB to be stored in .IBD files to support transportable table spaces, and to support hand driven disaster recovery as a last resort. This work by Dyre provides an extension of the handler and handlerton APIs with virtual functions for handling serialized meta data. Each storage engine can handle this information according to its preference, e.g. NDB can ship the meta data across the cluster to other nodes for re-creation. InnoDB can store the meta data in tablespace files to make them self contained. Other storage engines like MyISAM can rely on the default implementation, which will serialize the meta data and save it into a separate file, hence providing the same benefits as the .FRM files. Note that this is preparatory work, no effect in 8.0.0.

Data Dictionary Cache

Dictionary object cache (WL#8150) — This work by Sivert Sørumgård implements the data dictionary cache. It provides a shared cache for data dictionary objects, it tracks usage of data dictionary objects, and it handles modification of data dictionary objects. The dictionary object cache is designed to replace several old caches.

InnoDB

InnoDB: Provide storage for tablespace dictionary (WL#7053) — This work by Satya Bodapati provides an API and a storage location for serialized dictionary information. The work is done in the context of the new data dictionary. In 8.0, in addition to storing meta-data in normal InnoDB tables in the global data dictionary tablespace, the meta-data is also stored in a serialized form together with the user tables containing the data. This makes the user tablespaces self descriptive and they can thus be moved to a different MySQL instance (import/export), tools can extract meta-data from user tablespaces, etc. This also provides for redundancy, with one copy in the Global data dictionary tablespace and one or two copies in tablespaces containing the data. The redundancy provides for a way out in certain disaster recovery scenarios where the dictionary information is lost or damaged. Note that this is preparatory work, no effect in 8.0.0.

External tool to extract InnoDB tablespace dictionary information (WL#7066) — This work by Satya Bodapati implements a tool for extracting Serialized Dictionary Information (SDI) from an InnoDB tablespace. The tool can be used online and the user can specify which of the two copies to extract. The tool will write the SDI record id, type, data in JSON format. The tool can for example be used in certain disaster recovery scenario: A table is corrupted in a tablespace, the SDI should be extracted into a text file and the meta-data for the corrupted table removed. With the edited SDI & .ibd, it can be used to import the tablespace with the corrupted table. Note that this is preparatory work, no effect in 8.0.0.

Data Dictionary APIs

Develop Google test framework and guidelines for data dictionary API (WL#7770) — This work by Sivert Sørumgård develops a Google Test framework to write unit tests for data dictionary API work. It also provides guidelines for how to write such tests.

Make sure errors are properly handled in data dictionary API (WL#7771) — This work by Gopal Shankar harmonizes error handling across all data dictionary APs and ensures that error reporting is consistently implemented and complete across these APIs.

Formulate framework for API for data dictionary (WL#6380) — This work by Alexander Nozdrin defines basic principles and guidelines for implementing the new data dictionary. In other words: how new API should look like and what principles should be followed.

Implement common code for different data dictionary APIs (WL#7284) — This work by Alexander Nozdrin implements common code following the guidelines set in WL#6380. Different API implementations are based on some core API and have common code. This work specifies that core API.

Define and Implement API for data dictionary objects (multiple worklogs) — This work by Gopal Shankar work specifies interfaces of the following new data dictionary object types: Function, Procedure, Parameter (WL#6384), Event (WL#6388), Trigger (WL#6383), Partition, Partition_value, Partition_index (WL#7630), Abstract_table, Table, Column, Column_type_element, Index, Index_column, Foreign_key, Foreign_key_column (WL#6382), View (WL#6389), Tablespace, Tablespace_file (WL#6387), and Charset, Collation, Schema (WL#6385) . The work follows guidelines from API framework class definitions described in WL#7284.

Use new data dictionary API for data dictionary objects (multiple worklogs) — This work by Gopal Shankar, Mattias Jonsson, Dmitry Shulga, Praveenkumar Hulakund, and Thayumanavar Sachithanantha changes the server to use the new data dictionary. Tables: WL#6390 makes the server use the new API defined by WL#6382 / WL#6389 / WL#6387 / WL#6385 to store metadata instead of .FRM files. Partitions: WL#7836 makes the server use the new API defined by WL#7630 instead of .PAR files. Triggers: WL#7896 makes the server use the new API defined by WL#6383 instead of .TRG/.TRN files. Stored Routines: WL#7897 makes the server use the new API defined by WL#6384 to create/drop/load triggers. Events: WL#7898 makes the server use the new API defined by WL#6388 to create, alter, drop and load events.

Move FOREIGN KEY constraints to the global data dictionary (WL#6929) — This work by Jon Olav Hauglid populates the new data dictionary tables with foreign key information. See WL#6379 for the foreign key table definitions. Meta-data for Foreign Keys are moved up from the SE layer to the common data dictionary paving the way for proper meta-data locking for Foreign Keys. Note that this is preparatory work, right now the new dictionary tables are populated but not used. Now, in 8.0.0 the Foreign Keys work as they do in 5.7.

Deprecation and Removal

Deprecate and remove partitioning storage engine (WL#8971) — This work by Sivert Sørumgård deprecates (5.7) and removes (8.0) the partitioning storage engine (ha_partition). The responsibility for partitioning has been moved down to the storage engine layer. InnoDB supports partitioning from 5.7 and onwards.

Remove –ssl, –ssl-verify-server-cert client-side options (WL#9091) — This work by Ramil Kalimullin removes the --ssl and --ssl-verify-server-cert client options in 8.0 (deprecated in 5.7). This includes removal of --ssl synonyms such as --skip-ssl. (The server side --ssl option remains unchanged). These client side options were replaced by the --ssl-mode client option in 5.7 (WL#8785).

Deprecation warnings for bit-wise operation with [VAR]BINARY args (WL#9015) — This work by Catalin Besleaga adds a warning in 5.7 in the cases where bit-wise operations changes from returning BIGINT in 5.7 to returning BINARY in 8.0. See “Backward-compatibility and incompatibilities” section in WL#8699.

Remove deprecated GIS functions (WL#8157) — This work by Erik Froseth removes function names that were deprecated (WL#8055, WL#7541) in 5.7. This is a standardization effort, the actual functionality is still present but another function name with the ST_ or MBR prefix must be used.

Deprecate and remove mysql_shutdown() (WL#9014) — This work by Georgi Kodinov deprecates the COM_SHUTDOWN RPC and the C API function mysql_shutdown() in 5.7 and removes them in 8.0. This shutdown functionality has been replaced by the SHUTDOWN SQL statement (COM_QUERY RPC), see WL#6784.

Deprecate and remove the parameter innodb_support_xa (WL#8843) — This work by Marko Mäkelä deprecates the obsolete innodb_support_xa start-up option / session variable in 5.7 and removes it in 8.0. The default value of innodb_support_xa was ON already in May 2007 (built-in InnoDB in MySQL 5.1). So, except that this parameter no longer exists in 8.0, there are no behavioral changes associated with this removal.

InnoDB: Remove deprecated parameter innodb_stats_sample_pages (WL#8903) — This work by Annamalai Gurusami removes the parameter innodb_stats_sample_pages in 8.0, the parameter was deprecated in 5.6. You should use >innodb_stats_transient_sample_pages instead.

InnoDB: Remove deprecated parameter innodb_locks_unsafe_for_binlog (WL#8894) — This work by Annamalai Gurusami removes the parameter innodb_locks_unsafe_for_binlog in 8.0, the parameter was deprecated in 5.6. The option is no longer needed as the READ COMMITTED transaction isolation level does the same as the parameter innodb_locks_unsafe_for_binlog.

InnoDB: Remove the deprecated parameter innodb_checksums (WL#8893) — This work by Annamalai GurusamiIn removes the parameter innodb_checksums that was deprecated in 5.6. The functionality is covered by innodb_checksum_algorithm also introduced in 5.6.

InnoDB: Remove the use of *.isl files (WL#6416) — This work by Marko Mäkelä removes *.isl (InnoDB Symbolic Link) files introduced in 5.6. The functionality has been replaced with the functionality in the new data dictionary.

InnoDB startup refactoring (WL#7488 ) — This work by Marko Mäkelä refactors the InnoDB startup code. On Windows, setting innodb_flush_method has been decoupled from innodb_use_native_aio. The value innodb_flush_method=unbuffered can be used instead.

InnoDB: Remove deprecated file format parameters in 8.0 ( WL#7704) — This work by Marko Mäkelä removes the configuration variables (options) innodb_file_format, innodb_file_format_check, innodb_file_format_max, and innodb_large_prefix in 8.0. These options were deprecated in 5.7, see WL#7703. The file_format column in information schema tables innodb_sys_tablespaces and innodb_sys_tables has also been removed. See also WL#8307 “Make ROW_FORMAT=DYNAMIC the default”.

Deprecate mysql_plugin in 5.7 and remove in 8.0 (WL#8927) — This work by Dyre Tjeldvoll deprecates the mysql_plugin utility in 5.7, and removes it in 8.0. This utility is no longer in use, alternatives include loading plugins at server startup using the --plugin-load or --plugin-load-add option, or at runtime using the INSTALL PLUGIN statement.

Remove deprecated LOAD DATA events from replication code (WL#9218) — This work by Libing Song removes binlogging of the following events for LOAD DATA statements: LOAD_EVENT, CREATE_FILE_EVENT, EXEC_LOAD_EVENT and NEW_LOAD_EVENT. They were deprecated in 5.0. These events have been replaced by BEGIN_LOAD_QUERY_EVENT and EXECUTE_LOAD_QUERY_EVENT.

Remove the code in log_event_old.cc (WL#9217) — This work by Libing Song removes some code that has not been in use since 5.1. See WL#9218 above.

Remove mysql_install_db and server –bootstrap option (WL#9071) — This work by Sivert Sørumgård removes the mysql_install_db and the server --bootstrap option in MySQL 8.0, both deprecated in 5.7 by WL#7307. This has been replaced by new bootstrap code, see WL#6394 above.

That’s it for now. Thank you for using MySQL!