WL#4618: RBR: extended table metadata in the binary log

Affects: Server-8.0   —   Status: Complete   —   Priority: Medium

EXECUTIVE SUMMARY
=================

This worklog extends the table metadata that is written to the binary
log in row based logging. The extended metadata serves two major purposes:
* Slave needs the metadata to transfer data smoothly when its table structure
  is different from master's.

* External software needs the metadata to decode row events and store
  the data into external databases(e.g. data warehouse).

SCENARIOS
=========

SCENARIO 1
----------
* User Story

  As a DBA I want to setup MySQL to replicate successfully in ROW
  mode so that replication with master and replicas that have
  different character sets for the same column is possible. E.g.:

  master> CREATE TABLE t1 (a varchar(10) character set latin1);
  slave> CREATE TABLE t1 (a varchar(10) character set utf8);

* Details
  MySQL currently supports statement based replication between
  tables with columns having slightly different data types.
  It includes columns with different character sets.

  If you create this table on master:

  CREATE TABLE t1 (a varchar(10) character set latin1);

  and this table on slave:

  CREATE TABLE t1 (a varchar(10) character set utf8);

  then statement based replication works fine.
  When slave re-parses a statement, it converts strings to utf8
  from character_set_client of the event to utf8.

  Row based replication doesn't work because
  it operates with binary images. So the strings in
  binary log are represented in latin1 character set,
  and no conversion happens to utf8.

  Possible solution how to make it work:
  - Extend table map event to transfer character set ID together
    with other column meta data, like data type, length and so on.
  - If character set of the data in the image is not the same with
    character of the column, then apply character set conversion.

SCENARIO 2
----------

* User Story
  As a DBA I want to setup MySQL to replicate successfully in ROW
  mode so that replication with master and replicas that have
  different signedness for the same column is possible. E.g.:

  @@slave_type_conversions='ALL_NON_LOSSY'
  MASTER> CREATE TABLE t (c1 INT UNSIGNED);
  SLAVE> ALTER TABLE t CHANGE COLUMN c1 c1 BIGINT SIGNED;
  MASTER> INSERT INTO t(c1) values (3916586877); 

* Details
  If using different signed columns on master and slave, the slave
  may go out of sync because there is no information shipped with
  the table metadata regarding the signedness of the field.

  For example (if using @@slave_type_conversions='ALL_NON_LOSSY'):

  MASTER> CREATE TABLE t (c1 INT UNSIGNED);
  SLAVE> ALTER TABLE t CHANGE COLUMN c1 c1 BIGINT SIGNED;
  MASTER> INSERT INTO t(c1) values (3916586877);
  MASTER> SELECT * FROM t; ---> 3916586877
  SLAVE> SELECT * FROM t; ---> -378380419

SCENARIO 3
----------
* User Story
  As a developer I want a ROW based MySQL replication stream to
  contain all metadata so that I am able to write applications that
  mine and integrate the ROW based replication stream into third
  party tools or databases, without having to search for metadata
  elsewhere other than the binary log itself.

* Details
  Some [3] have, for instance, created their own Table_metadata_log_event,
  which contains information on SIGNEDNESS, column names, column comments,
  etc.

  When decoding RBR events, the external programs need to know the metadata of
  them(e.g. ENUM string values, Column Names, PK etc.). It could get the
  metadata by querying the MySQL Server which generated the binary log. But
  that is invalid if the data structure is already updated. So it is better to
  log the metadata.

SCENARIO 4
----------
* User Story
  As a MySQL Replication developer, I want to be able to leverage
  the new metadata so that the ROW based replication applier is
  able to cope with different schemas on master and replicas. This
  enables is a stepping stone to better support rolling schemas
  upgrades when replicating using ROW based logs.

* Details
  There is a online DDL solution, which first changes slave's data structure
  and then change master's data structure. So the table on slave may have
  more or less columns, different column types and different column orders.
  If required metadata is logged, then slave can support the online DDL
  solution better. However, making the applier aware of the additional
  metadata when applying ROW events is out of this worklog scope.


FUNCTIONAL REQUIREMENTS
=======================
F1. SIGNED/UNSIGNED flags for numbers
    The binary log SHALL contain one flag per numerical column
    type stating whether it is signed or unsigned.

    The following types SHALL have an associated flag:
    - TINYINT/SMALLINT/MEDIUMINT/INT/BIGINT
    - DECIMAL/FLOAT/DOUBLE
    - TINYINT/SMALLINT/MEDIUMINT/INT/BIGINT
    - DECIMAL/FLOAT/DOUBLE

F2. CHARACTER SET
    The binary log SHALL contain character set information of character
    columns.

    The following types SHALL have character set information.
    CHAR/VARCHAR/TEXT

    Note: JSON is a character type. But the requirement doesn't
          include JSON type, since it just uses utf8mb4.

F3. Column Name
    The binary log SHALL contain column names of tables.

F4. String value of SET
    The binary log SHALL contain the string values of SET columns.

F5. String value of ENUM
    The binary log SHALL contain the string values of ENUM columns.

F6. PRIMARY KEY
    The binary log SHALL contain  primary key information of tables.

    Primary key information includes:
    - Fields in defined order
    - Prefix length used in the index defination.

F7. SQL types of every columns
    The binary log SHALL contain information to describe SQL
    types of columns. The information SHALL be able describe the exact
    SQL types of columns.

    NOTE: Some SQL types are just aliases of other SQL types. Server just
          store them as the real SQL types. So the requirement doesn't
          include information of alias SQL types. 
          Below are alias SQL types:
          - BOOL is just an alias of TINYINT.
          - NUMERIC is just an alias of DECIMAL.
 
NON-FUNCTIONAL REQUIREMENTS
===========================
NF1. Cross-version replication OLD -> NEW MUST work. OLD here means
     (5.5, 5.6, 5.7) and NEW means (5.8).

NF2. There SHALL not be any new restrictions regarding replicating
     form NEW -> OLD due to this worklog, even when the new
     metadata is replicated. OLD here means (5.5, 5.6, 5.7) and
     NEW means (5.8).

NF3. The design should support matadata expending in future.


BACKGROUND
==========
Below is some background information you may be interested. But you don't need
to review it.

Libing, here are a few notes I (Luis) have on the requirements:

Requirements from Twitter (see link in HLD):
- Column metadata
  - length
  - sql type
  - scale
    - # of decimal digits in fractional part of decimal data type
  - character set
  - PK, UK, non-null UK, ...
  - column name
  - sql type name
  - column comment

- table metadata
  - table identifier
  - # of columns

Requirements from OGG:
- column metadata:
  High prio:
  - nullable or not
  - PK or not
  - signed/unsigned
  - sql type
  - max character lengths
  - max octect length
  - character set
  - column name

  Less prio:
  - enumeration string names (not just the native values - integers)
  - unique index or not

Other requirements for ROLLING SCHEMA CHANGES and RBR:
- Apply rows relying on column names rather than order of columns in
  the event - lift the requirement that columns are added or removed
  at the end of the table.
- Fire triggers on the slave when applying RBR (out of this worklog
  scope).
  
  Schema change is roughly something like this:
  - create new table copy of original table on the slave
  - create log table on the slave
  - install triggers on original table
  - copy original table to new table
  - fire triggers on original table to log changes to log table
    - !! RBR does not fire triggers, SBR does though !!
  - apply log once copy original -> new is done
  - rename original to original_obsolete, new to original
  - drop original_obsolete and log

  (Sometimes a log table is not needed and triggers are fired to the
  new table directly. This can be done on the master or on the
  slaves. On the master takes capacity from the writes and generates a
  lot of IO.)


HIGH LEVEL SPECIFICATION
========================
- Extending Table_map_log_event
  Table_map_log_event already includes part of the metadata required by
  the requirements, so this design will extend Table_map_log_event to
  contain all the metadata described in the requirements.

- Some metadata should be optional
  Some sorts of metadata(e.g. column names) which occupy farely big
  storage are not reqired in many application environments.
  They are designed as optional metadata. And users can decide
  the metadata binlog behavior through an system variable.

- Metadata for requirements "F7. SQL types of every columns" and
  "F2. Character set"

  Currently, Table_map_log_event contains a binlog_type for each
  column. Most of them are one-to-one correspond to SQL types.

  Below is the map between binlog_type and SQL type:

    Binlogged Type                      SQL Type
    ----------------                    ------------
    MYSQL_TYPE_TINY                     TINYINT/BOOL
    MYSQL_TYPE_SHORT                    SMALLINT
    MYSQL_TYPE_INT24                    MEDIUMINT
    MYSQL_TYPE_LONG                     INT
    MYSQL_TYPE_LONGLONG                 BIGINT
    MYSQL_TYPE_NEWDECIMAL               DECIMAL/NUMERIC
    MYSQL_TYPE_FLOAT                    FLOAT
    MYSQL_TYPE_DOUBLE                   DOUBLE
    MYSQL_TYPE_BIT                      BIT
    MYSQL_TYPE_YEAR                     YEAR
    MYSQL_TYPE_NEWDATE                  DATE
    MYSQL_TYPE_TIME2                    TIME
    MYSQL_TYPE_DATETIME2                DATETIME
    MYSQL_TYPE_TIMESTAMP2               TIMESTAMP
    MYSQL_TYPE_STRING                   CHAR/BINARY/SET/ENUM
    MYSQL_TYPE_VARCHAR                  VARCHAR/BINARY
    MYSQL_TYPE_BLOB                     TINYBLOB/BLOB/MEDIUMBLOB/LOGBLOB/
                                        TINYTEXT/TEXT/MEDIUMTEXT/LONGTEXT
    MYSQL_TYPE_GEOMETRY                 GEOMETRY
    MYSQL_TYPE_JSON                     JSON

  The binlog types of SET and ENUM are MYSQL_TYPE_STRING, and their
  real type(MYSQL_TYPE_SET, MYSQL_TYPE_ENUM) is logged in metadata
  field. So they are determinable.

  With current Table_map_log_event metadata, CHAR and BINARY are not
  distinguishable, VARCHAR and VARBINARY are not distinguishable.
  BLOB and TEST are not distinguishable.

  Since the only different between CHAR/VARCHAR/TEXT and
  BINARY/VARCHAR/BLOB are character set. This design will use
  character set to distinguish them.

  "Requirement F2. Character set" just requires character set
   for CHAR/VARCHAR/TEXT. But in this design, Table_map_log_event
   will also contain character set of BINARY/VARBINARY/BLOB columns.

  All geometry types are binlogged as MYSQL_TYPE_GEOMETRY, the real types
  are not binlogged. we should binlog them. Real Geometry types includes:
  - GEOMETRY
  - POINT
  - LINESTRING
  - PONYGON
  - MULTIPOINT
  - MULTILINESTRING
  - MULTIPONYGON
  - GEOMETRYCOLLETION

INTERFACE SPECIFICATION
=======================

I1 - A New System Variable

     Variable Name : binlog_row_metadata
     Scope:  GLOBAL
     Type:   ENUM
     Dynamic: Yes
     Values:
       MINIMAL    includes SIGNED/UNSIGED flags, column character set
                  and geometry type metadata.
       FULL       All metadata is included.
     Default: MINIMAL

     By default mimial metadata is logged. In MINIAL mode, only SIGNED flags
     , column character set and geometry type are logged.

     To change the behavior permanentely, users need to update
     configuration file.

I2 - Table_map_log_event Format

     +------------------------------------------------+
     |                                                |
     |    Original fields of Table_map_log_event      |
     |                                                |
     +------------------------------------------------+
     | New field |         ...            | New field |
     +------------------------------------------------+
     | New field |         ...            | New field |
     +------------------------------------------------+

     Original fields are the fields implemented before this worklog, they are
     not changed. All new metadata is implemented as new fields and appended at
     the end of original Table_map_log_event fields.

     * Cross Version Compatibility
       It guarantees no cross version compatibility problem.
       OLD - the server before this worklog
       NEW - the server includes this worklog

       OLD --> NEW
       NEW server can read the Table_map_log_event generated by OLD server.

       NEW --> OLD
       OLD server just read the original fields. New fields will be ignored.
       OLD server already has the logic, so we don't need to patch it.


  - Format of New Fields

    All new field use below format:
    +------+----------+---------+
    | Type |  Length  |  Value  |
    +------+----------+---------+

    Type:   1 Byte, unsiged integer
    Length: 1-4 Bytes, packed length format.
    Value:  It varies agains Type.

    Type Values:
    1 - UNSIGNED flag of numeric columns
    2 - Default character set of string columns
    3 - Character set of string columns
    4 - Column name
    5 - String value of SET columns
    6 - String value of ENUM columns
    7 - Primary key without prefix
    8 - Primary key with prefix
    9 - Geometry type of geometry columns

    2 and 3 don't appear together. They are just two ways to pack character set
    information. When binlogging, it just log character set in the way which
    occupy less storage.

    7 and 8 don't appear together. 7 is for the primary keys which just use
    some prefix of columns. 8 is for the primary keys which use complete
    columns.

    * Order of Fields
      The order of new fields are not defined. So the new fields can be logged
      in any order. E.g.
      +---------------+----------------------+
      | Character set | SIGNED/UNSIGNED flag |
      +---------------+----------------------+
      or
      +----------------------+---------------+
      | SIGNED/UNSIGNED flag | Character set |
      +----------------------+---------------+

  - SIGNED/UNSIGNED Flag
    +-------------------------+
    |     Flags               |
    +-------------------------+
    Flags: (Numeric Column Count+7)/8) Bytes.

    Each bit describes one numeric column's SIGNED/UNSIGNED flag.
    0 - signed
    1 - unsigned

  - Default character set of string columns
    +--------------+---------+--------------+---------+----------+-------------+
    | Default      | Column1 | Column1      |   ...   | ColumnN  | ColumnN     |
    | Collation ID | Index   | Collation ID |         | Index ID | Collation ID|
    +--------------+---------+--------------+---------+----------+-------------+
    Collation ID: 1-4 Bytes, packed length format.
    Column Index: 1-2 Bytes, packed length format.

    This field is logged when most of the columns have same character sets.

  - Character set of string columns
    +--------------+---------+--------------+
    | Column1      |   ...   | ColumnN      |
    | Collation ID |   ...   | Collation ID |
    +--------------+---------+--------------+
    Collation ID: 1-4 Bytes, packed length format.

    This field is logged when most of the columns have different character sets.

  - Column Name
    +---------+---------+--------+---------+---------+
    | Column1 | Column1 |  ...   | ColumnN | ColumnN |
    | Length  | Name    |        | Length  | Name    |
    +---------+---------+--------+---------+---------+
    Length : 1 Byte. unsiged integer
    Name   : Length Bytes. String without '\0'

  - String Value of SET Columns
  - String Value of ENUM Columns
    +---------+---------+--------+---------+---------+
    | Column1 | Column1 |  ...   | ColumnN | ColumnN |
    | Length  | Values  |        | Length  | Name    |
    +---------+---------+--------+---------+---------+

    Length : 1-4 Bytes. packed length
    Values : Length Bytes. Value's internal structure is:
    +--------+--------+---------+----------------+
    | Value1 | Value1 |  ...   | ValueN | ValueN |
    | Length |        |        | Length |        |
    +--------+--------+---------+----------------+
    Length: 1-4 Bytes. packed length.
    Value : String value without "\n"

  - Primary Index Without Prefix
    +--------------+--------+----------------+
    | Column1 Index |  ...   | ColumnN Index |
    +--------------+--------+----------------+
    Column Index : 1-3 Bytes. Packed length format of column index
                   0 < Column Index < Column Count

  - Primary Index With Prefix
    +---------+---------+---------+---------+---------+
    | Column1 | Column1 |   ...   | ColumnN | ColumnN |
    | Index   | Prefix  |         | Index   | Prefix  |
    +---------+---------+---------+---------+---------+
    Column Index : 1-3 Bytes. Packed length format of column index
                   0 < Column Index < Column Count
    Column Prefix: 1-3 Bytes. Packed length format  of column prefix length.
                   0 mean no prefix.

  - Geometry Types
    +---------------+--------+----------------+
    | Geometry Type |  ...   | Geometry Type  |
    +---------------+--------+----------------+
    Geometry Type: 1-3 Bytes. Packed length format of geometry type.

I3 - Prints the extra metadata through mysqlbinlog
     In this worklog, the feature is just for testing purpose.

     * Option --print-table-metadata
       Type BOOL, values: ON, OFF. 

     * Output Format
       It has two parts: column information and primary key information.
       Each of them will be printed as a SQL single line comment.

       Column information looks like:
       # Columns(c1 INT, c2 MEDIUMINT UNSIGNED, ...)
       if column names are not binloged, colum names will not be printed.
       # Columns( INT, MEDIUMINT UNSIGNED, ...)

       Character is printed in one of the below format:
       - Every character column is followed by a character set
         # Columns(c1 CHAR(10) CHARSET latin1 COLLATION latin1_swedish_ci,
	  c2 TEXT CHARSET utf8 COLLATE utf8_unicode_520_binary)
       - Charset columns may or may not followed by a charset set and
         there is a default charset for the columns without charset.
	 # Columns(c1 CHAR(10), c2 TEXT CHARSET utf8 COLLATE
	  utf8_unicode_520_binary) DEFAULT CHARSET latin1 COLLATE
	  latin1_swedish_ci

        NOTE: if a character set has binary charset it will be printed as
        BINARY/VARBINARY/TINYBLOB/MEDIUMBLOB/BLOB/LONGBLOB types.

      Primary Key is printed in below format:
      # PRIMARY KEY(c1, c3(10), c2)
No framework changes in this worklog, since it just extends Table_map_log_event.

So the LLD is pretty simple.

- Table_map_log_event::write_data_body()
  Expands it to binlog the new metadata field.

- Table_map_log_event::print()
  Expands it to print metadata detail for mysqlbinlog.