WL#12968: Configure replication applier to require row-based replication

Affects: Server-8.0   —   Status: Complete

EXECUTIVE SUMMARY

This worklog aims to limit replication so that a channel only accepts row-based replication. This allows to restrict the type of instruction executed on the slave, and as a side effect, limits the number of privileges a user needs when associated to the replication applier.

FUNCTIONAL REQUIREMENTS

F1. It shall be possible to restrict the operation of the slave to only accept row based replication events.
It shall be possible to configure this mode: REQUIRE_ROW_FORMAT, through a CHANGE MASTER TO command.

F2. Furthermore this mode shall forbid the creation of temporary tables by the slave applier and the execution of some types of events.

F3: The mode shall also prevent the application of LOAD DATA, INTVAR, RAND or USER_VAR events.

F4. REQUIRE_ROW_FORMAT should be disable by default.

F5. REQUIRE_ROW_FORMAT should be enabled for group replication channels when they are created by the user or the plugin.

F6. REQUIRE_ROW_FORMAT can't be disabled for group replication channels.

F7. It is not possible to set a valid, non null user to PRIVILEGE_CHECKS_USER if REQUIRE_ROW_FORMAT = 0

F8. It is not possible to set REQUIRE_ROW_FORMAT= 0 if PRIVILEGE_CHECKS_USER is set to a valid user

F9. Replication channels that pre-exist before upgrade, shall after an upgrade use REQUIRE_ROW_FORMAT=0, if it is a regular replication channel.
1 if it is a GR channel or there is a defined PRIVILEGE_CHECKS_USER.

F10. If REQUIRE_ROW_FORMAT = 1, the replication applier shall not set @@session.pseudo_thread_id.

F11. If REQUIRE_ROW_FORMAT = 1, then Format_description_log_event shall not attempt to remove files from the @@global.slave_load_tmpdir directory.

F12. It shall be possible to observe the new configuration through performance_schema tables.

F13. REQUIRE_ROW_FORMAT shall be persisted in replication repositories.

F14. RESET SLAVE should not clear the value of REQUIRE_ROW_FORMAT.

F15. A new session configuration variable is added, @@session.require_row_format, that restricts in a similar way client sessions.

F16. When @@session.require_row_format is enabled, DML operations shall only be executed through encoded BINLOG statements.

F17. When @@session.require_row_format is enabled, CREATE or DROP TEMPORARY TABLE queries shall not be allowed.

F18. No privileges are needed to enable @@session.require_row_format.
The user needs SESSION_ADMIN in order to disable it.

F19. There shall be a new option to mysqlbinlog:

--require-row-format

When this option is given, mysqlbinlog shall do the following:
- Stop with an error in case a LOAD DATA, INTVAR, RAND or USER_VAR events occur.
- Stop with an error on the creation or drop of temporary tables
- Stop with an error when a DML query logged using statement based logging is found
- Output a SET @@session.require_row_format = 1 statement at the beginning of the output.
- Omit the output of SET @@session.pseudo_thread_id, which it usually shown on the output.

NON-FUNCTIONAL REQUIREMENTS

NF1. When enabled on a slave server, REQUIRE_ROW_FORMAT shall reduce performance by less than 1% for all slave workloads.

NF2. When disabled, @@session.require_row_format restrictions should have no effect on transaction execution performance.

NF3. When enabled, @@session.require_row_format restrictions shall reduce performance by less than 1% for all workloads.

BASICS

This section describes the basic blocks and functioning for this worklog

2.1. Slave side

From the slave side, in order to restrict a channel to row base replication the user shall execute a CHANGE MASTER command.
The command shall have a REQUIRE_ROW_FORMAT option:

  CHANGE MASTER TO REQUIRE_ROW_FORMAT = [0|1]

This option shall be recorded in the channel Relay_log_info table/file.

What to do

When executing and before queuing an event, the slave infrastructure shall evaluate it in the context of the event flow and break replication on invalid cases.
This work shall be done with the help of the Transaction boundary parser output.

The slave shall error out if executing on row-base only mode if:

  • If one of the following events is received
    • INTVAR_EVENT
    • RAND_EVENT
    • USER_VAR_EVENT
    • BEGIN_LOAD_QUERY_EVENT
    • EXECUTE_LOAD_QUERY_EVENT
    • APPEND_BLOCK_EVENT
    • DELETE_FILE_EVENT
  • If a QUERY_EVENT in a DDL context contains a CREATE/DROP TEMPORARY TABLE query
  • If a non row event is spotted on a DML transaction (Inside a BEGIN COMMIT or XA block).

Without load query events or the creation of temporary tables the slave no longer needs to:

  • Set @@session.pseudo_thread_id

  • Remove the files from the @@global.slave_load_tmpdir directory

Where to do it

Since the Transaction boundary parser output is already present in the IO thread, it is cheap to add this checks here.
This brings some advantages:

  • Fail faster

  • Less Relay log pollution

It is not enough though for several reasons.
First is that the slave applier can be seen as standalone component that can be used without its IO counterpart, Group replication is an example of such.
Another reason is that if a user stops the slave and starts with REQUIRE_ROW_FORMAT the expectation is that the slave will fail for events already logged but not applied.

So this check shall primarily be added to the applier part of the slave.

Failure handling

When an invalid event is detected on queuing, the slave shall:

  • Report an error to SHOW SLAVE STATUS and performance_schema.replication_connection_status (E1)
  • Prevent the write of the failing event to the relay log
  • Stop the receiver thread

When an invalid event is detected on application, the slave shall:

  • Report an error to SHOW SLAVE STATUS and performance_schema.replication_applier_status_by_coordinator (E2)
  • Do no queue the failing event to any worker
  • Stop the SQL thread and consequently the workers (the same way it does for other coordinator errors)

2.2. Client side

Mainly concerning mysqlbinlog and the execution of events, we also will introduce a new session variable

      NAME: require_row_format
    SCOPES: session only
      TYPE: boolean
   DEFAULT: OFF
PRIVILEGES: No privileges required to set the value to ON.
            SESSION_ADMIN required to set the value to OFF.

This variable checked on mysql_execute_command will ensure that no CRUD commands (INSERT, DELETE, etc) are executed.
The only DML operations allowed must be coded in the SQLCOM_BINLOG_BASE64_EVENT form.
Queries that do not respect the restrictions shall fail (E3).

It also ensures that for CREATE TABLE events, the query can't have a TEMPORARY specifier.

2.3. mysqlbinlog

There shall be a new option to mysqlbinlog:

--require-row-format

When this option is given, mysqlbinlog shall do the following:

  • Stop with an error in case a LOAD DATA INFILE or other above forbidden event occurs as described in 2.1.

  • Stop with an error if a QUERY_EVENT in a DDL context contains a CREATE/DROP TEMPORARY TABLE query

  • Stop with an error if a non row event is spotted on a DML transaction (Inside a BEGIN COMMIT or XA block).

  • Print a SET @@session.require_row_format = 1 statement at the beginning of the output.

  • Omit the printing of SET @@session.pseudo_thread_id, which it usually prints.

USER INTERFACE

4.1. New syntax for CHANGE MASTER TO

This section describes new syntax for CHANGE MASTER

There is one new SQL statement clause:

CHANGE MASTER TO REQUIRE_ROW_FORMAT = [0|1]

The effect is: if REQUIRE_ROW_FORMAT = 1, the channel once started, will check the following restrictions:

  1. When a new replication channel is created using CHANGE MASTER, and no REQUIRE_ROW_FORMAT clause is specified, then the channel shall be configured with REQUIRE_ROW_FORMAT = 0.

  2. When RESET SLAVE is used, it shall not affect REQUIRE_ROW_FORMAT.

  3. When RESET SLAVE ALL is used in such a way that all replication channels are removed and a new default channel is created, then the new default channel shall have REQUIRE_ROW_FORMAT = 0.

  4. This setting can only be set when the slave is stopped (SQL and IO threads).

  5. This value shall be 1 for Group Replication channels when they are created.

  6. This option must be disable for use in group replication channels.

  7. It is not possible to set a valid, non null user to PRIVILEGE_CHECKS_USER if REQUIRE_ROW_FORMAT = 0 (E7)

  8. It is not possible to set REQUIRE_ROW_FORMAT= 0 if PRIVILEGE_CHECKS_USER is set to a valid user (E6)

  9. The value shall be enforced as being 0 or 1, if not error ER_REQUIRE_ROW_FORMAT_INVALID_VALUE will be returned. (E5)

4.2. Session Variable : require_row_format

A new session variable was added

Name: require_row_format
Type : boolean
Default : OFF
Scope: session only
Dynamic: yes
Replicated: no
Persistable: NO
Credentials: No privileges required to set the value to ON.
             SESSION_ADMIN required to set the value to OFF.
Description: Limit the application of queries to row based events
             and DDLs with the exception of temporary table creation/deletion. (D1)

When set, from that point on, and for that session the restrictions above described (2.2) are enforced.

4.3. mysqlbinlog interface changes

There shall be a new option to mysqlbinlog:

 --require-row-format

This changes the behavior of this utility making it error out in the above described cases.

5. PERSISTENT CONFIGURATION

File

When @@global.relay_log_info_repository=FILE, the file shall contain one extra line:

  • the flag REQUIRE_ROW_FORMAT

Table

When @@global.relay_log_info_repository=TABLE, the mysql.slave_relay_log_info table shall contain one extra column:

 REQUIRE_ROW_FORMAT
 BOOLEAN
 NOT NULL
 COMMENT 'Indicates whether the channel shall only accept row based events.' (D2)

6. UPGRADES

  1. When the server is upgraded from a version that does not have this worklog, to a version that has it, all existing standard replication channels shall have REQUIRE_ROW_FORMAT = 0 if not PRIVILEGE_CHECKS_USER is defined.

  2. When the server is upgraded from a version that does not have this worklog, to a version that has it, all existing standard replication channels shall have REQUIRE_ROW_FORMAT = 1 if a PRIVILEGE_CHECKS_USER is defined.

  3. When the server is upgraded from a version that does not have this worklog, to a version that has it, all existing group replication channels shall have REQUIRE_ROW_FORMAT = 1.

7. OBSERVABILITY

7.1 Performance_schema

The performance_schema.replication_applier_configuration table shall have the following new column:

  REQUIRE_ROW_FORMAT
  BOOLEAN
  COMMENT 'Indicates whether the channel shall only accept row based events.' (D2)

This shall show for the current channel what is value for REQUIRE_ROW_FORMAT.

7.2 Error and description messages

E1. Error on IO thread detection of an invalid event

ER_RPL_SLAVE_QUEUE_EVENT_FAILED_INVALID_NON_ROW_FORMAT

eng "The queue event failed for channel '%s' as an invalid event according to REQUIRE_ROW_FORMAT was found.

E2. Error on SQL thread detection of an invalid event

ER_RPL_SLAVE_APPLY_LOG_EVENT_FAILED_INVALID_NON_ROW_FORMAT

eng "The application of relay events failed for channel '%s' as an invalid event according to REQUIRE_ROW_FORMAT was found.

E3. Error on client query fails against @@session.require_row_format

ER_CLIENT_QUERY_FAILURE_INVALID_NON_ROW_FORMAT

eng "The query does not comply with variable require_row_format restrictions.

E4. ER_RPL_SLAVE_SQL_THREAD_STARTING

On WL#12966 a new message was added

ER_RPL_SLAVE_SQL_THREAD_STARTING_WITH_PRIVILEGE_CHECKS

The issue is that adding yet another message here would lead to a matrix of options and log messages.
So this WL proposes a change in content to the existing error

ER_RPL_SLAVE_SQL_THREAD_STARTING 

Maintaining the error code, the message will now be

"Slave SQL thread%s initialized, starting replication in log
  '%s' at position %s, relay log '%s' position: %s %s"

The extra %s at the end will allow us to append

, user:'{user}'@'{host}', roles: {roles}"

and also

, require_row_format = {require_row_format_var}"

E5. Invalid REQUIRE_ROW_FORMAT value is passed to CHANGE MASTER

ER_REQUIRE_ROW_FORMAT_INVALID_VALUE

eng "The requested value %s is invalid for REQUIRE_ROW_FORMAT, must be either 0 or 1."

E6. REQUIRE_ROW_FORMAT cant be set to 0 when a PRIVILEGE_CHECKS_USER is set to a valid value

ER_CLIENT_REQ_ROW_PRIV_CHECKS_USER_NOT_NULL eng "REQUIRE_ROW_FORMAT for replication channel '%.192s' can't be set to %d unless PRIVILEGE_CHECKS_USER is also set to %s."

ER_LOG_REQ_ROW_PRIV_CHECKS_USER_NOT_NULL eng "REQUIRE_ROW_FORMAT for replication channel '%.192s' can't be set to %d unless PRIVILEGE_CHECKS_USER is also set to %s."

E7. PRIVILEGE_CHECKS_USER cant be set to valid value when a REQUIRE_ROW_FORMAT is 1

ER_LOG_PRIV_CHECKS_REQUIRE_ROW_FORMAT_NOT_SET eng "PRIVILEGE_CHECKS_USER for replication channel '%.192s' can't be set to %.64s@%.255s unless REQUIRE_ROW_FORMAT is also set to %d."

ER_CLIENT_PRIV_CHECKS_REQUIRE_ROW_FORMAT_NOT_SET eng "PRIVILEGE_CHECKS_USER for replication channel '%.192s' can't be set to %.64s@%.255s unless REQUIRE_ROW_FORMAT is also set to %d."

D1. Description of @@session.require_row_format

Limit the application of queries to row based events and DDLs with the exception of temporary table creation/deletion.

D2. Description of require_row_format column on tables

Indicates whether the channel shall only accept row based events.

D3. Description of --require-row-format in mysqlbinlog

Fail when printing an event that was not logged using row format or other forbidden events like Load instructions or the creation/deletion of temporary tables.

PLAN

1. Slave side

1.1 Add fields to relay log info

Base: Add a new member in Relay_log_info to store REQUIRE_ROW_FORMAT (File and Table).
Add the field to the relay log class.
Make the constructors initialize it.

Tests:

  • Verify compilation.

Depends on LLD:

  • Might depend on 2.1.
    The reasoning here is that the code always reads the thread session value, and we simply set the IO thread session value at start.

1.2 Add info to performance schema

Observability: Implement REQUIRE_ROW_FORMAT column in performance_schema.replication_applier_configuration.

Tests:

  • Verify that the column contains 0 by default
  • On a further phase test that the value is correct (Depends on phase 1.3)

Depends on LLD:

  • Step 1.1

1.3 Add the syntax for CHANGE MASTER

Syntax: Add syntax for CHANGE MASTER TO REQUIRE_ROW_FORMAT.
Make CHANGE MASTER update the Relay_log_info members.

Tests:

  • Verify that the CHANGE MASTER syntax is accepted.
  • Verify that the CHANGE MASTER has observable effects on the p_s table.
  • Verify that RESET SLAVE has no observable effects on the p_s table.
  • Verify that RESET SLAVE ALL has observable effects on the p_s table.
  • Verify that CHANGE MASTER fails when the channel IO or SQL thread is running.
  • Verify that CHANGE MASTER fails for group replication channels with this OPTION.
  • Verify that REQUIRE_ROW_FORMAT should be enabled for group replication channels when created by the user or GR.
  • It is not possible to set a valid, non null user to PRIVILEGE_CHECKS_USER if REQUIRE_ROW_FORMAT = 0
  • It is not possible to set REQUIRE_ROW_FORMAT= 0 if PRIVILEGE_CHECKS_USER is set to a valid user

Depends on LLD:

  • Step 1.2 (hence also 1.1)

1.4 Identify non-RBR events during application (SQL Thread)

In the slave while queuing check the incoming events types and in combination with the transaction boundary parser.
The proto design is

  1. Refactor a bit Transaction_boundary_parser so it receives
     generic Log_event information and not packets or objects.

  2. On the Transaction_boundary_parser preserve the last returned
  enum_event_boundary_type

  3. Create a method Transaction_boundary_parser::evaluate_rbr_restrictions

    // INTVAR_EVENT, RAND_EVENT, USER_VAR_EVENT
    if (boundary_type == EVENT_BOUNDARY_TYPE_PRE_STATEMENT)
      error out

    //DDL
    if (EVENT_BOUNDARY_TYPE_STATEMENT || current_parser_state == NONE)
      if (query starts with CREATE TEMPORARY or DELETE TEMPORARY)
       error out 

   //DML
   if (EVENT_BOUNDARY_TYPE_STATEMENT || current_parser_state == DML)
   if (event_type != TABLE_MAP_EVENT &&
       event_type != WRITE_ROWS_EVENT(V1) &&
       event_type != UPDATE_ROWS_EVENT(V1) &&
       event_type != DELETE_ROWS_EVENT(V1) &&
       event_type != PARTIAL_UPDATE_ROWS_EVENT &&
       event_type != VIEW_CHANGE_EVENT)
     error out 

   //Defend against injected LOAD events
   if (event_type = BEGIN_LOAD_QUERY_EVENT ||
       event_type = EXECUTE_LOAD_QUERY_EVENT ||
       event_type = APPEND_BLOCK_EVENT ||
       event_type = DELETE_FILE_EVENT)
     error out 

In order to simplify the design, the easy point to do this is on the SQL thread.

So on rpl_slave.cc we add to exec_relay_log_event this tracking code based on the Transaction boundary parser.

This design leaves an open door to move this code to the applier in itself, even offsetting this job to the slave workers.

Tests:

  • Verify that forbidden events (intvar,rand,..,load) in a RL cause replication to fail and stop
  • Verify that create or delete queries to temporary table in a RL cause replication to fail and stop
  • Verify that any DML query logged with statement based replication in a RL cause replication to fail and stop

Depends on LLD:

  • Step 1.3 (hence also 1.1, 1.2)
  • It can still be parallelized with (1-3) assuming a dummy require_row_format var set to true.

1.5 Identify non-RBR events in the binlog event stream (IO Thread)

Add to rpl_slave.cc in the queue_event
Check incoming events types and in combination with the transaction boundary parser evaluate if they are valid.
The proto design is

feed_event() 
if (REQUIRE_ROW_FORMAT)
  if (evaluate_rbr_restrictions())
   error out

Tests:

  • Verify that forbidden events (intvar,rand,..,load) cause replication to fail and stop
  • Verify that create or delete queries to temporary table cause replication to fail and stop
  • Verify that any DML query logged with statement based replication cause replication to fail and stop

Depends on LLD:

  • Step 1.4

1.6 Skip privilege related operations under row format only mode

In the slave code, if require_row_format is active then bypass the operations:

 In Query_log_event::do_apply_event
   // skip
   thd->variables.pseudo_thread_id = thread_id;  // for temp tables

In Format_description_log_event::do_apply_event
  // skip 
  cleanup_load_tmpdir();

Tests:

  • Verify that a PRIVILEGE_CHECKS_USER no longer needs SESSION_ADMIN for Query Log Events.

Depends on LLD:

  • Step 1.3 (hence also 1.1, 1.2)
  • It can still be parallelized with (1-3) assuming a dummy require_row_format var set to true.

1.7 Persistent configuration: Save the configuration in replication repositories.

Tests:

  • REQUIRE_ROW_FORMAT configuration exists in table/file.
  • REQUIRE_ROW_FORMAT configuration survives a server restart.

Depends on LLD:

  • Step 1.3 (hence also 1.1, 1.2)

1.8 Upgrades: Add the column to mysql.slave_relay_log_info during upgrade.

The value of the new column shall be dependent on the data of the table, depending if the channel is a group replication channel or PRIVILEGE_CHECKS_USER is defined. The same rules apply to this field when reading from old relay log info files.

Tests:

  • TC1: Verify that the column is added when doing an upgrade from 5.7
  • TC2: Verify that the column is added when doing an upgrade from 8.0
  • TC3: Verify that the column value is 0 for channels with no PRIVILEGE_CHECKS_USER
  • TC4: Verify that the column is 1 for channels with a defined PRIVILEGE_CHECKS_USER
  • TC5: Verify that the column is 1 for group replication channels
  • TC6: Verify that TC3, TC4 and TC5 are true when upgrading from Table repositories
  • TC7: Verify that TC3 and TC4 are true when upgrading from File repositories

Depends on LLD:

  • Steps 1.7

2. Client side

2.1 Create session sys_var require_row_format

Create a system variable on the session scope.

Name: require_row_format
Type : boolean
Default : OFF
Scope: session only
Dynamic: yes
Replicated: no
Persistable: NO
Credentials: No privileges required to set the value to ON.
             SESSION_ADMIN/SYSTEM_VARIABLES_ADMIN required to set the value to OFF.
Description: Limit the application of queries to row based events
             and DDLs with the exception of temporary table creation/deletion. 

The simplest implementation for the usage of this var is that both the slave and the client code read the thread value. The slave code sets this value for the IO thread.

Tests:

  • Basic tests that the var cannot be set globally
  • Basic tests that the var only accepts boolean values and default is OFF
  • Test that prove SESSION_ADMIN is needed to set it to OFF. No privileges are needed to set it ON

Depends on LLD:

  • No dependencies

2.2 Client query execution of non row based events.

The idea here is that on a client session with

@@session.require_row_format = 1,

We disallow any direct DML queries. So the following SQL commands will error out:

SQLCOM_UPDATE,
SQLCOM_INSERT,
SQLCOM_INSERT_SELECT,
SQLCOM_DELETE,
SQLCOM_TRUNCATE,
SQLCOM_LOAD,
SQLCOM_REPLACE,
SQLCOM_REPLACE_SELECT,
SQLCOM_DELETE_MULTI,
SQLCOM_UPDATE_MULTI

We also check temporary table create and deletion so we check for

SQLCOM_CREATE_TABLE and
lex->create_info->options & HA_LEX_CREATE_TMP_TABLE

also

SQLCOM_DROP_TABLE and lex->drop_temporary = true

The most likely candidate for the location of this code is an auxiliary method called in mysql_execute_command.

Tests:

  • Check no INSERT, UPDATE, DELETE, REPLACE or Load operations are not allowed when @@session.require_row_format = 1
  • Check the above restrictions are not checked when @@session.require_row_format = 0
  • Check no creation or deletion of temporary tables is allowed when @@session.require_row_format = 1
  • Check encoded binary log events can still be executed.

Depends on LLD:

  • Depends on 2.1

3. mysqlbinlog side

3.1 Add a new option

Add to mysqlbinlog the option

--require-row-format

Add the option in the code and the info that methods check about the current printing context.

Tests:

  • Just check mysqlbinlog is still working

Depends on LLD:

  • No dependencies

3.2 Change the behavior of mysqlbinlog - printing vars

Update main() in mysqlbinlog.cc to print at the beginning

SET @@session.require_row_format = 1 

Update Query_log_event::print_query_header in log_event.cc
If require row format is active skip the pseudo_thread_id print.

Tests:

  • Check mysqlbinlog prints the require_row_format=1 line when --require-row-format is given
  • Check mysqlbinlog does not print pseudo_thread_id related lines when --require-row-format is given

Depends on LLD:

  • Depends on 3.1

3.3 Change the behavior of mysqlbinlog

Update process_event in mysqlbinlog.cc when the above option is given

The proto design, and the more generic approach is to use transaction boundary parser methods described on 1.4.

This implies that the logic of the transaction parser must be cut of all server ties and will be moved to an external lib like libbinlogevents.

The event loop in the mysqlbinlog process shall then call evaluate_rbr_restrictions when the option --require-row-format is given.

Tests:

  • Verify that forbidden events (intvar,rand,..,load) cause the mysqlbinlog process to error out
  • Verify that create or delete queries to temporary table cause the mysqlbinlog process to error out
  • Verify that any DML query logged with statement based replication cause the mysqlbinlog process to error out

Depends on LLD:

  • Depends on 3.1, 1.4