WL#7158: Move privilege system tables from MyISAM to transactional storage

Affects: Server-8.0   —   Status: Complete

Up until MySQL 5.6 privilege system tables are stored in MyISAM. With the new
data dictionary stored in transactional tables, we need to move these tables.
The intention is to do a minimal effort for this move, so refactoring can happen
Func-Req 1
Modify InnoDB to accept the privilege tables. The privilege tables are 
columns_priv, db, procs_priv, proxies_priv, tables_priv, user.

Func-Req 2
Modify install/bootstrap to create the privilege tables in InnoDB, not MyISAM.

Func-Req 3
Modify mysql_upgrade to move existing privilege tables in MyISAM to InnoDB.

Func-Req 4
Tweak the tests where needed, preserving intended behavior wherever possible.

Func-Req 5
Check that the privilege tables are stored in InnoDB while loading ACL data.
Warn if some privilege table is not in InnoDB. We still should be able to start
in normal mode (without requiring --skip-grant-tables) in this case for upgrade
process to work.

Func-Req 6
Deny ACL statements if storage engine of privilege tables is not InnoDB.

Func-Req 7
Allow CREATE TABLE to create privilege tables in MyISAM. This is necessary
to be able to upgrade installations via mysqldump. Allow ALTER TABLE to 
change the storage engine of privilege tables to MyISAM to enable downgrade.

Func-Req 8
The ACL statement execution must be resistant to parallel SQL transactions
on the privilege tables.

Func-Req 9
The multi-user-account ACL statements must preserve the current well-adopted
(but undocumented) behavior, which is skip the failing user account and
proceed with the following user accounts.

Func-Req 10
The ACL statements must be resistant to potential InnoDB errors which
might happen due to some hardware/environment problem. An error might occur
in any InnoDB handler-call.

Func-Req 11
The ACL statements must use SERIALIZABLE isolation level.

Func-Req 12
The ACL statements are DDL statements (implicit commit before and after
statement execution).
Transactional and non-transactional SE
This WL is about moving MyISAM system tables into InnoDB.

This WL however tends to use "non-transactional" instead of MyISAM and
"transactional" instead of InnoDB. The reasons for that are the following:

  - not all system tables are MyISAM. There are (or might be) CSV tables
    for instance. This WL is about only privileges tables. This is a step
    forward to move all Data Dictionary tables to InnoDB;

  - at the moment, we are talking about the Data Dictionary in InnoDB only,
    but in the future, we might want to the DD tables in other SEs. Thus, it's
    better to talk in general terms from the start and highlight the current
    dependencies with / requirements from InnoDB.

Essential changes
In the scope of this WL the following system tables will be migrated to
the InnoDB storage engine:
  - columns_priv
  - db
  - procs_priv
  - proxies_priv
  - tables_priv
  - user

These tables will be further referenced as "privilege tables" or "ACL

The privilege tables are accessed read-write.

ACL statements
There are the following SQL statements to operate on the ACL tables:

Even though that is not mentioned in the Manual, these statements are DDL
statements, which means they force implicit commit before and after
statement execution.

This was not so important when the ACL tables were MyISAM (no transaction
support anyway), but once ACL tables are stored in InnoDB, that must be
noted in the documentation.

Especially, that is relevant for the 'SET PASSWORD' statement, which looks
similar to the regular SET statement for changing system variables. 'SET
PASSWORD' however has nothing to do with that regular 'SET' statement and
is a DDL statement.

Differences between MyISAM and InnoDB
There are the following main differences between MyISAM and InnoDB which
are relevant for this WL:

  - MyISAM locks the whole table for write (MyISAM has table-level locks).
    InnoDB locks specific rows for write, allowing concurrent access to
    other rows (InnoDB uses row-level locks).

    There are the following practical outcomes :

    1. Error handling

      The ACL code should get ready to handle the following errors from the
      storage engine:

        - lock wait timeout -- another session might lock the row being
          updated by the ACL statement using regular SQL statement
          (SELECT ... FOR UPDATE);

        - duplicate key -- another session might insert a row being added
          by the ACL statement using regular SQL statement
          (INSERT INTO ...)

        - deadlock

      See more in the "Handling errors" section below.

    2. Visibility problem

      Once a MyISAM table is locked, concurrent access is blocked, so the
      reader will always read the latest data. On the contrary, InnoDB
      employs MVCC, which means concurrent transactions might read
      different versions of the same row.
      In other words, FLUSH PRIVILEGES might read an outdated data.

      This problem is solved as follows:

        - We emulate MyISAM table-level locks by using exclusive MDL locks
          on privilege tables. That ensures that once privilege tables are
          opened and locked, there are no other concurrent reader/writers.

          See "Preventing lock-wait-timeout and deadlock (table-level
          locks)" below for more information.

        - The ACL statements use the SERIALIZABLE isolation level.

          See "InnoDB isolation level" below for more information.

    3. Interrelation between InnoDB locks and server (ACL) locks.

      The problem of the existing code design is that it does I/O
      operations (InnoDB calls) while holding the server locks (mutex /
      rw-lock). Under some circumstances, that might lead to a deadlock
      between the server and InnoDB layers.

      This problem is also solved by emulating MyISAM table-level locks.

      See "Preventing lock-wait-timeout and deadlock (table-level locks)"
      below for more information.

  - Commit can not fail for MyISAM, but can fail for InnoDB.

    This is important because the ACL code employs the following pattern:

      1. Acquire LOCK_grant;

      2. Write data to the SE tables;

      3. Update in-memory ACL-cache;

      4. Release LOCK_grant;

      5. Commit;

      6. If commit on the previous step failed, do acl_reload().

    So, the commit on step (5) happens outside of the critical section
    (without holding the LOCK_grant lock). If the commit fails, that means,
    the rows in SE tables were not updated, but the in-memory ACL cache is
    updated and unlocked. That in turn means, there is a gap between (4)
    and (6) when the ACL cache and the data in SE tables are not
    consistent: the ACL in-memory cache is newer than the persistent data.

    If the new information is retrieved from the cache within that gap, it
    might break replication:

      - let's say a new user was added (or a new privilege was granted).
        That user is in the ACL in-memory cache, but not in the persistent
        InnoDB tables.

      - a query comes under the authorization of that user;

      - the query is processed and written to the binlog;

      - acl_reload() happens, the new user disappears;

      - the slave will be unable to execute the query as the slave doesn't
        know anything about that new user.

    In order to fix that problem, the steps (5) and (6) (commit and
    acl_reload()) could be done while holding the LOCK_grant, but that's
    practically impossible as it requires much refactoring of the code
    because it's generally assumed that no commit / open_tables operation
    is done while holding the LOCK_grant.
    In other words, moving the commit/acl_reload() within the critical
    section changes the order in which locks are acquired. Thus, it's a
    risky change.

    NOTE: we need to do the full ACL cache reload (call acl_reload())
    because the cache is not transactional, i.e. we can not rollback the
    changes done to it if we failed to commit them on disk. The only way to
    make it consistent with the data on disk is to throw the cache away and
    reload it from SE.

Handling errors


The thing is that most ACL statements accept multiple user accounts
(components) within one statement.

For example:
  CREATE USER user1@host1, user2@host2, user3@host3;
  DROP USER user1@host1, user2@host2, user3@host3;

The list of such statements:

The question is what to do when we failed to process some component in the
middle of the statement.

For instance, what should be done if we're handling the DROP USER statement
above and user2@host2 does not exist?

The current behavior (which is undocumented, but nevertheless widely
adopted) is to ignore the failed component and continue the statement
execution. In the end, the statement will report a list of failed

  > CREATE USER u1@h1, root@localhost, u2@h2, root@;
  ERROR 1396 (HY000): Operation CREATE USER failed for
  > DROP USER u1@h1, u2@h2, u3@h3, u4@h4, u5@h5;
  ERROR 1396 (HY000): Operation DROP USER failed for 

The decision is to preserve the current behavior as much as possible.

This must be reflected in the Manual.


  - SET PASSWORD takes only one argument.

  - The GRANT behavior is controlled by the NO_AUTO_CREATE_USER,
    which prevents the GRANT statement from automatically creating
    new users if it would otherwise do so.

    If NO_AUTO_CREATE_USER is set, the GRANT behavior is similar
    to the behavior of other ACL statements: skip the failed user
    account, continue the statement execution.

    If NO_AUTO_CREATE_USER is unset, the GRANT can still fail due to
    invalid or non-existing privilege specification (e.g. specified
    column doesn't exist). In that case the whole GRANT statement
    is failed and no change is applied.

Logic & Physical errors

The errors in general can be divided into 2 large groups:

  * Logic errors

    A logic error is an error made by DBA, where she specified wrong or
    inappropriate input for the ACL statement. For example, a non-existing
    user for the DROP USER statement, or existing user for the CREATE USER
    statement, etc.

    Logic errors must be expected and can be frequent, which means it's
    inappropriate to do the ACL cache reload when we encounter a logic

    The current behavior must be preserved for logic errors, which is:
      - ignore the error;
      - continue the statement processing;
      - report errors at the end of of statement execution;
      - avoid acl_reload().

    In order to preserve the current behavior in case of logic errors, the
    following actions are done:
      - failed ACL statement is written into the binary log along with the
        error code;
      - the ACL transaction is committed to persist valid changes.

  * Physical errors

    A so called physical error happens in InnoDB and is raised to
    the server layer. Those errors are results of either a user activity
    or internal InnoDB error. When a physical error has occurred,
    we rollback the whole statement and do acl_reload() to reload 
    the ACL cache.

    More precisely, there are two major sources of the physical errors in

    1. Concurrent access using SQL statements.

      A user/DBA can concurrently change or lock ACL tables using regular
      SQL statements like INSERT, UDPATE, DELETE, SELECT.

      Changing of ACL tables by modifying the raw data in the ACL tables
      can be seen as another interface to the ACL system. The ACL tables
      are documented:


      The user activity can result in the following errors:
        - Lock wait timeout

          A user locked a specific row in one session by 'SELECT ... FOR
          UPDATE' and is executing ACL-statement in another session.

          See more in the sub-section more.

        - Deadlock

          An ACL statement locks ACL records / tables in a particular
          order. It is theoretically possible to have a user transaction
          (with regular SQL statements) which locks the same records /
          tables in the opposite order, resulting in a deadlock.
          See more in the sub-section more.

        - Duplicate key error

          The 'duplicate key' error could also happen theoretically.
          Practically, however, it's not possible to get this error.
          See more in the sub-section below.

      NOTE: The lock wait timeout and deadlock errors might result in
      rollback of the entire InnoDB transaction:

        - there is an system variable (innodb_rollback_on_timeout) that
          specifies if the entire transaction should be rolled back on

        - in case of deadlock, InnoDB chooses the transaction to rollback
          according to its internal rules, which are not controllable to
          the server.

      In short, these errors (lock-wait-timeout, deadlock & duplicate-key
      error) will be handled by preventing them from happening.

    2. Hardware & environmental failures.

      Internal InnoDB error or hardware failure, or errors like "out of
      space". We don't distinguish between errors in this category. This
      type of errors is expected to be rare. We do acl_reload() as the last
      resort when we encounter these errors.

      The user will get a new error code/message saying that the entire
      statement failed due to InnoDB error.


Even though ACL statement is a DDL statement (it commits before and after
execution), one ACL statement can access several ACL records / tables. It
locks them in some order. It is theoretically possible to have a user
transaction (with regular SQL statements) which locks the same tables in
the opposite order, resulting in a deadlock.

It's a matter of race condition to expose such a deadlock. The thing is
that ACL statements are executed under the ACL lock acquired. Recent
refactorings changed the ACL logic so that every SQL statement processing
starts with acquiring the ACL lock. Thus, the following sequence of
operations might happen in order to get a deadlock:

  1. Let's consider some ACL statement, which accesses two ACL tables
  (t1 and t2) in this particular order.

  2. connection A: lock t2:

  3. connection A: start an SQL statement, which will lock t1:

     The execution of this statement must go through the privilege check
     (the ACL lock is acquired, privileges are checked, the ACL lock is
     released) and immediately pause after that.

  4. connection B: start executing the ACL statement. It will acquire the
  ACL lock, lock t1 and hang on attempt to lock t2 (t2 is already locked by
  connection A).

  5. connection A: continue the SQL statement so that it will try to lock
  t1 (now locked by connection B).

  6. This is a deadlock. Depending on weights, transaction A or B will be
  rolled back.

Preventing lock-wait-timeout and deadlock (table-level locks)

The current design of the ACL code doesn't allow to cope with the
lock-wait-timeouts and deadlocks. There are two main design deficiencies:

  1. I/O operations (InnoDB/SE operations) are performed under acquired
  LOCK_grant RW-lock.
    That means that once InnoDB starts any kind of wait, the whole server
    is put to a hold -- any other connection request or SQL statement
    execution need to acquire LOCK_grant in order to authorize the request.

  2. I/O operations (InnoDB/SE operations) are interfered with the ACL
  cache updates.
    That means that when some InnoDB operation failed, the ACL cache might
    have already been updated.

An example of scenarios which make the server practically unresponsive is
the following:

  - connection A: lock a row in mysql.user

    INSERT INTO mysql.user(user, host, ...) VALUES ('u2', 'h', ...);

  - connection B: issue an ACL statement that need the locked row

    CREATE USER u1@h, u2@h;

  CREATE USER successfully adds an InnoDB record for u1@h, inserts the
  corresponding entry to the ACL cache and then hangs on inserting an
  InnoDB record for u2@h. The wait is done with the LOCK_grant acquired.
  So, that it's impossible for any other connection to issue any SQL
  statement which requires authorization.

  Later, when the InnoDB operation fails with the lock-wait-timeout,
  there is no way to restore the ACL cache -- it's impossible to do the
  acl_reload() as it would also hang while reading from mysql.user
  (also with the acquired LOCK_grant).

The only way to migrate to InnoDB without massive re-engineering to fix
these design deficiencies is to simulate the MyISAM behavior with regard to
the locking principles. That is to use table-level locks on the ACL tables.
In fact, we don't need row-level locking scheme for the ACL tables anyway,
so this is not a bad trade-off.

The simplest way to switch to the table-level locking scheme is to use the
MDL layer. The thing is that the ACL tables must be locked with the
MDL_SHARED_NO_READ_WRITE lock. That will guarantee that once the ACL tables
are opened and locked, the ACL statement has an exclusive access to them
and no other SQL statement could block it.

Since we rely on the MDL layer, it is hypothetically possible to get a
conflict (lock-wait-timeout) if another connection used NDB API / other
no-SQL API to access the ACL tables. This WL will not address that issue.

We will add DBUG_ASSERT() after all handler calls, where it's
hypothetically possible to get lock-wait-timeout or a deadlock error,
making sure the calls never return these errors.

Duplicate key error

Looking at the ACL code, it looks like it's possible to get the duplicate
key error by executing a scenario like the following:

  - Issue ACL statement which inserts a new record(s) in one or more

    For example: CREATE USER u1@h1;

  - The ACL statement is executed as follows:

    - The server tries to find the matching record by issuing

    - If the call returned the key-no-found error, the server believes the
      record does not exist and proceeds with inserting a new record;

  - Thus, there is a gap between checking that the record doesn't exist and
    inserting a new record. If another connection was able to insert the
    matching record, the ha_write_row() in the first connection would
    return the duplicate-key error.

By closer investigation, it turned out, that it's not possible for another
connection to come in the middle between ha_index_read_idx_map() and
ha_write_row(). There are two reasons for that:

  1. SERIALIZABLE isolation level.
    The ACL statement is executed under the SERIALIZABLE isolation level
    (see the next section), which means the first read starts a new
    transaction and other sessions/transactions will wait till the ACL
    transaction finishes (or return lock-wait-timeout).

    The logic that ensures the SERIALIZABLE isolation level is a bit
    intricate. Two InnoDB functions controls that:
      - ha_innobase::external_lock()
      - ha_innobase::store_lock()

  2. Table-level locks at the MDL level.

In order to protect from the future InnoDB changes which might lead to the
change of the isolation mode being used for ACL statements, the following
actions will be taken in scope of this WL:

  - we will add a test using a debug sync point. That test will simulate
    the situation described above (another connection inserts a record in
    the middle of ha_index_read_idx_map() and ha_write_row()).

  - we will add DBUG_ASSERT() after all handler calls, which write/update
    records, making sure the calls never return the duplicate-key-error.

Handling "unknown error"
Currently the ACL code sometimes assumes that if the handler call failed,
it's because only one reason (the record not found, for instance). Since
the caller's logic ensures that the handler call must not happen, the code
reports the "unknown error" to the user.

This is not appropriate any longer as InnoDB call might fail for the number
of reasons. The actual (relevant) error must be reported to the user.

InnoDB isolation level
It's expected that all ACL statements are executed under the SERIALIZABLE
isolation level (irrespective of the current isolation level set by the

The SERIALIZABLE level is required as otherwise it might lead to wrong
order in binlog (especially when ACL statements are mixed with the DML
statements on the ACL tables). Also, the SERIALIZABLE level is expected for
all DDL statements in general.

There is no change required to ensure the SERIALIZABLE level for the ACL
statements as it's already done in InnoDB: the lock type is determined in
the following 2 InnoDB operations:
  - ha_innobase::external_lock()
  - ha_innobase::store_lock()
They take LOCK_NONE only for pre-defined set of SQL-commands, that doesn't
include ACL statements.

See also "Preventing lock-wait-timeout and deadlock (table-level locks)" as
the table-level locks described there also helps with getting the
serializable semantics.

Due to the transactional nature of InnoDB, the following scenario could be
hypothetically possible:
  1. connection A: received ACL statement (e.g. DROP USER);
  2. connection A: start executing the ACL statement. The ACL cache and
     data on disk were updated, but the transaction hasn't been committed yet.
  3. connection B: received FLUSH PRIVILEGES
  4. connection B: start executing FLUSH PRIVILEGES, read the stale data
     from InnoDB;
  5. connection A: committed the transaction;
  6. connection B: updated the ACL cache using stale data.

This scenario is not practically possible because of
  a) table-level locks emulated by the MDL-locks
  b) the SERIALIZABLE isolation level.
FLUSH PRIVILEGES will wait for connection A to complete its transaction
before proceed with the read.

A test case should be added to the test coverage.

FLUSH PRIVILEGES (non)atomicity problem
The execution of the FLUSH PRIVILEGES statement is not atomic. This is an
old known problem, which has nothing to do with this WL.

Replication aspect (SBR)
Once ACL tables are migrated to InnoDB, the ACL statements will be written
to the binlog as transactional statements (technically, they will be
written to the transactional cache, which will be put into the binary log
on commit).

Verifying error handling
As it's been already mentioned, the InnoDB handler calls can return errors
in cases which were impossible for MyISAM (like lock-wait-timeout, deadlock
and so on). The existing code is not ready for handling such errors (which
were simply not possible with MyISAM). Thus, the question is how to
identify all the places where such errors can occur and make sure they are
handled properly.

It's proposed to use two ways to resolve this:

  1. Code review

  2. Adding DBUG_EXECUTE_IF() macros after the handler calls and make them
  return "new" error. Then, MTR test can make sure, all places are covered
  and any InnoDB failure will not result in an unhandled error.

Changes to InnoDB
In order to allow the listed system tables to be created with InnoDB
storage engine the function pointer handler::is_supported_system_table will
be initialized by the function innobase_is_supported_system_table() that
returns true in case when it is called for one of the system tables listed

FLUSH PRIVILEGES and InnoDB table export
InnoDB has a special handling of the FLUSH TABLES WITH READ LOCK: InnoDB
does a so-called "table export". The problem is that all FLUSH statements
use the same SQL-command identifier. It must be ensured that the FLUSH
PRIVILEGES statement does not lead to table export.

Changes to SQL-scripts
The sql-scripts that are used to deploy new database will be modified in
order to set the InnoDB as a default storage for the system tables listed
above. Similar changes will be done in sql-scripts used to make server

Changes in behavior on fetching results from ACL tables
MyISAM fetches rows from a table in the order that they were inserted.
InnoDB fetches rows in the order of the primary key. Thus, there will be
behavioral change when the rows are fetched in SE specific order.

In particular, this is relevant to the mysql.user table, which is fetched
by the server code with no specific order: the ACL in-memory cache is
populated from mysql.user in the order of some internal "weights"
calculated for every row/account. However, when the weights are equal, the
cache entries are inserted in the row (SE-specific) order.

That leads to the following change in behaviour:

  - Let's assume there were the following user accounts created:
    - u1@localhost
    - u1@
    - u1@::1

  - They are all essentially the same, but they are different in the MySQL
    terms. The internal weights for them are equal, so they are the ACL
    entries for them are inserted in the row order, which is:

    - MyISAM (the insertion order):
      - u1@localhost
      - u1@
      - u1@::1

    - InnoDB (the primary key order):
      - u1@
      - u1@localhost
      - u1@::1

  - Thus, an authentication request for the user 'u1' from the localhost
    used to be resolved as 'u1@localhost' (1st entry matched), but with
    WL#7158 it will be 'u1@'.

It's a good way to shoot yourself in a foot to have different set of
privileges for 'u1@localhost' and 'u1@', but still, technically
from the MySQL view point, these are completely different accounts.

It was discussed with the Server General team whether we should preserve
the original behaviour, and the decision was that the result is not worth
the hassle.

Changes in ordering of fetched records led to many mtr test failures.

There were two alternatives to fix arisen issues:
   * The first one was to fix every of failed tests to take into account changes
     in record ordering.
   * The second one was to remove records for root@::1 and root@ from
     the script mtr_system_tables_data.sql that is used to create mandatory
     accounts before running tests. Doing so we could retained original behavior
     for tests. Of course such modification required changes for those tests 
     that relied on presence of accounts root@ and root@::1 but there
     were not so many such tests. Actually we only had to worry about those
     test cases that call explicit connect() on behalf of either root@::1 or
     root@ and where the option skip-name-resolve is set, because only
     in this case an ip part of a user account name isn’t converted to
     localhost. There were’t so many tests where this condition were satisfied.

So in result it was chosen the second approach as least expensive.

Also it makes sense to add that the the following approaches were considered
in order to preserve the original behaviour:

  - Adding an additional AUTO_INCREMENT column that can be used to reorder
    result set based on insertion order;

  - Adding an additional TIMESTAMP column. In this case it could lead to
    possible issues with presence of several rows that have the same
    TIMESTAMP value (since rows were inserted at almost the same time).

Neither of these approach were selected as be worthy of notice.

WL#7158 depends on WL#7976 (removal of skip-innodb option) that has already
been pushed.

MyISAM and InnoDB interoperability
In place upgrade
Current procedure for in place upgrade assumes that newer server should be
able to start on old privilege tables in regular mode (without any special
start-up parameter). After that mysql_upgrade is invoked to upgrade
privilege table definitions.

To support this procedure server will be able to load privilege information
from ACL tables in MyISAM at start-up/FLUSH PRIVILEGES time. However we
will report a warning to error log about privilege tables needing upgrade.
Also we will prohibit any user-manipulation statements if privilege tables
are not using InnoDB, thus urging users to upgrade ACL tables definitions.

Upgrade via mysqldump
Current procedure for upgrade using mysqldump involves restoring dump
from old server to a newer server started in regular mode and then running
mysql_upgrade to update privilege tables definition if necessary.
Since dump from old server will include definitions of ACL tables in MyISAM
storage engine in the form of CREATE TABLE statements, this means that we
need to allow creation of privilege tables in MyISAM on newer server.
However we don't need to allow user-manipulation statements until tables
are upgraded to InnoDB, basic authentication/authorization functionality
should be sufficient.

Support for server downgrade
At the moment we have mysql_system_tables_fix_for_downgrade.sql script which
is supposed to be executed against newer server to prepare instance for
downgrade. Particularly this script needs to change privilege tables
definitions to MyISAM. This means that we should allow ALTERing ACL
tables to MyISAM on newer server in regular mode. However again we are
not supposed to be able to run any ACL statements after that.