WL#15294: Extending GTID with tags to identify group of transactions

Affects: Server-8.x — Status: Complete

Description
Requirements
High Level Architecture
Low Level Design

EXECUTIVE SUMMARY
===============================================================================
  This worklog is aimed at easier identification of groups of
  transactions that were executed for different purposes. Right now,
  the user is able to set GTID for the next
  transaction by setting GTID_NEXT to UUID:NUMBER. The user is also
  allowed to set GTID_NEXT to AUTOMATIC, which means that the server
  will generate GTIDs for transactions executing within the current
  session scope. The goal of this worklog is allow the user to
  specify a name for a group of transactions, so that
  the user can easily distinguish this group by simply looking at
  transactions GTIDs.

  This worklog extends the GTID definition. Current GTID implementation consists
  of unique source UUID and transaction sequence number.
  Introduced is a GTID tag, which may be assigned to a single 
  transaction or a group of transactions by the user.

  Therefore, definition of a tagged transaction GTID is:
  UUID:TAG:NUMBER. When transaction tag is unspecified, transaction
  keeps UUID:NUMBER definition.

  The user is able to assign a tag for transaction GTID by executing the
  SET GTID_NEXT command. The user can assign a tag to a single transaction
  with specified UUID and GNO or to all transactions generated within current
  session scope that will be generated automatically.

  TAG definition accepted in the system is the following:

  ^[a-zA-Z_][a-zA-Z0-9_]{0,31}$

  which means that:
  - tag consist of up to 32 characters (<=32)
  - tag accepts letters with ASCII codes between 'a'-'z' and 'A'-'Z',
    numbers (0-9), and the underscore character; tag must start with a letter
    or underscore
  - tag definition is case insensitive, after the user provides a tag,
    tag is normalized to contain only lower-case letters

  Please note that the user may choose to skip tag definition. This way, tag
  will be empty.

  SET gtid_next='AUTOMATIC:' is only allowed when gtid_mode is ON or
  ON_PERMISSIVE. If gtid_mode is OFF or OFF_PERMISSIVE,
  SET gtid_next='AUTOMATIC:' gives an error. In case the current mode
  is ON or ON_PERMISSIVE, and there is any session ongoing with gtid_next set
  to 'AUTOMATIC:', the user is not able to change the mode to any
  of the incompatible GTID modes (OFF / OFF_PERMISSIVE).


  Setting gtid_next to '::' is allowed in case
  current GTID mode is ON, ON_PERMISSIVE or OFF_PERMISSIVE.
  Otherwise, setting a tag produces an error. In case the current mode is ON,
  ON_PERMISSIVE or OFF_PERMISSIVE, and there is GTID specified for the next
  transaction that includes a TAG, the user is not able to change the mode
  to any of the incompatible GTID modes (OFF).

USER/DEV STORIES
===============================================================================

  US1. As a MySQL user I want to assign a specific name tag to a group of
    transactions GTIDs
  * so that I can distinguish transactions that are from
    domain 1 (e.g. normal data) from domain 2 (e.g. admin operations) just by
    looking at transaction GTID.

  US2. As a MySQL server administrator, I want to restrict the use of SET
    GTID_NEXT functionality to a given set of MySQL users (or roles)
    * so that I ensure that only those users (related to a given data domain)
      can commit new transactions with assigned name tags.

SCOPE
===============================================================================

  After this worklog, the user is able to provide a custom
  transaction TAG to be applied at commit time for transactions that
  are originated in the same session (or at certification time when 
  running a Group replication plugin). The user is able to set TAG for
  upcoming transaction / transactions by executing the SET GTID_NEXT command.
  New allowed values for the GTID_NEXT are:
   - :: (ASSIGNED_GTID)
   - AUTOMATIC: (AUTOMATIC_GTID)
  Setting GTID_NEXT to:
   - :GNO
   - AUTOMATIC
  results in producing an empty transaction tag.

  The scope of this worklog is changing implementation of the GTID
  representation in the system. Introduced are GTID tags,
  which may be assigned by the user with specific privileges level.
  Changes include: adjusting implementation of the GTID (adding Tag component),
  changes in GTID generation at commit time in asynchronous replication, changes
  in GTID generation at certification time in GR, GTID reading and writing,
  coding and encoding of GTID related events. This worklog also requires changes
  in all of the places that propagate or present information about transaction
  GTID, which include amongst others: GTID related events, GTID related
  system messages, mysqlbinlog client, InnoDB redo log and commands presenting
  information about current system state that include information about
  transactions GTIDs.

  This WL introduces a new level of privileges (TRANSACTION_GTID_TAG)
  under which the user (or replication applier) may set a tag for the next
  transaction.

LIMITATIONS
===============================================================================

  L1. When using a fully specified GTID, being: ::,
  the user is responsible for providing GTID that is unique in a given
  replication topology
    Note: inherited from : - ASSIGNED_GTID

REFERENCES
===============================================================================
  None.

FUNCTIONAL REQUIREMENTS:
===============================================================================
FR1. The user shall be allowed to provide a tag for a fully specified GTID.

FR2. The user shall be allowed to provide the TAG for an automatically
   generated GTID.

FR3. GTID definition shall be TSID:GNO, being:
  * TSID - Transaction Source Id, which is a pair of:
   - UUID - UUID of the source from which transaction originates
   - Tag - Transaction Tag, name tag assigned by the user, may be empty
  * GNO - transaction sequence number.

FR4. GTID tags will be persisted alongside the other GTID components in
  all the storage mediums where it is currently persisted (binlog, redo log,
  gtid executed table).

FR5. TAG part of the transaction GTID, provided by the user by execution
   of the SET GTID_NEXT command, shall be applicable for
   transactions executed in the current session scope committing after
   execution of the SET GTID_NEXT command.

FR6. For automatic GTID generation, the source shall automatically generate
  a transaction sequence number that is unique
  for a pair of UUID and a tag.

FR7. For automatic GTID generation, the source shall not produce gaps
  in generation of a GTID for any TSID.

FR8. Server shall provide functionality of skipping tagged GTIDs in the
     COM_BINLOG_DUMP_GTID.

INTERFACE:
===============================================================================

NFR1. User shall be able to provide a tag of transaction GTID for the current
      session by execution of the SET GTID_NEXT command.

NFR2. Server shall accept transaction tag in one of the following formats:
    a) ::
    b) AUTOMATIC:

NFR3. System shall accept TAGS in the format in line with the following
  regular expression: ^[a-zA-Z_][a-zA-Z0-9_]{0,31}$ (case insensitive)

NFR4. Text representation of the GTID shall be one of the following:
  * UUID:GNO (empty tag)
  * UUID:Tag:GNO (with a tag assigned by the user)

SECURITY:
===============================================================================
NFR5. Change of GTID for the next transaction shall be allowed only if the user
executing the transaction has the TRANSACTION_GTID_TAG and at least one of the: 
  SYSTEM_VARIABLES_ADMIN, SESSION_VARIABLES_ADMIN or REPLICATION_APPLIER
  privileges.

NFR6. Change of GTID for the next transaction in replication applier thread
shall be allowed only if the account for a replication channel
the following privileges levels:
  TRANSACTION_GTID_TAG and REPLICATION_APPLIER.

NFR7. When upgrading, every user with the BINLOG_ADMIN privilege shall be
granted the TRANSACTION_GTID_TAG privilege.

OBSERVABILITY:
===============================================================================

NFR8. The user shall be able to observe an assigned transaction tag in all
  observable GTID representations, be it tables, commands, state variables
  or output produced by tools such as mysqlbinlog


===============================================================================
ASSUMPTIONS OF USE:
===============================================================================

AOU1. When tagging a fully specified GTID, the user shall provide a GTID
    that is unique in a given replication topology.

SUMMARY OF THE APPROACH
===============================================================================

## Purpose

This worklog is aimed at extending the current identification of the
transactions within the system. It gives the user a possibility of
tagging groups of transactions - giving a group of transactions a specific
name. This way, the user will be able to easily distinguish between groups
of transactions by looking at transaction GTID and searching for
a specific transaction tag. Server already provides functionality to assign
unique or automatically generated GTID for the next transaction or transactions
executed within the current session scope. Instead of using source
UUID as first part of transaction identifier, we introduce Transaction
Source Identifier (TSID) composed of source UUID and transaction tag (Tag).
TSID will replace SID as the first component of transaction GTID.

## Scope

User provided tag will be preserved for the current session.

Generation: There is no change foreseen in the GTID generation function
used at commit time. Algorithm searches for the next possible and free
(not committed, not owned) number for the next transaction. Function
will need to ingest TSID instead of UUID
(functionality of the GTID_NEXT=AUTOMATIC). This way, the user
will be able to distinguish between certain types of transactions by
looking at transaction tag, e.g.:

11111111-1111-1111-1111-111111111111:1-4
11111111-1111-1111-1111-111111111111:admin:1-3
11111111-1111-1111-1111-111111111112:admin2:1-10
11111111-1111-1111-1111-111111111112:admin:1-10

Please note that in case of multi-source topologies, the user is allowed
to include transactions of different origin in the same group, identified by
a specific tag.

SET gtid_next='AUTOMATIC:' is only allowed when gtid_mode is ON or
ON_PERMISSIVE. If gtid_mode is OFF or OFF_PERMISSIVE,
SET gtid_next='AUTOMATIC:' gives an error. In case the current mode
is ON or ON_PERMISSIVE, and there is any session ongoing with gtid_next set
to 'AUTOMATIC:', the user is not able to change the mode to any
of the incompatible GTID modes (OFF / OFF_PERMISSIVE).

Setting gtid_next to '::' is allowed in case
current GTID mode is ON, ON_PERMISSIVE or OFF_PERMISSIVE.
Otherwise, setting a tag produces an error. In case the current mode is ON,
ON_PERMISSIVE or OFF_PERMISSIVE, and there is GTID specified for the next
transaction that includes a TAG, the user is not able to change the mode
to any of the incompatible GTID modes (OFF).

The scope of this WL assumes:

* changes in GTID definition by defining TSID that consists of source
UUID and transaction tag. Every TSID will be mapped into one SIDNO.
* algorithms will be changed to ingest and manipulate TSID instead of UUID
* changes in GTID read, write, encode and decode functions
* modification of GTID events which propagate information regarding the
  transaction GTID (see PROTOCOL CHANGES section below)
* changes in group replication:
  Group replication does not use automatic assignment invoked in the binlog
  commit flush stage. Instead, certifier is responsible for assigning automatic
  GTID for the upcoming transactions. Defined values for the GTID NEXT are
  sent from the primary and GTIDs are generated and assigned in all group
  members.
  Gtid specification is transmitted using the Transaction_context_event. Since
  this WL does not introduce any new transaction GTIDs specification types,
  Transaction_context_event will possibly carry tagged GTIDs, but its structure
  will remain unchanged.

  In the current implementation, certifier keeps track of free GTIDs for the group
  UUID (one SIDNO) in a separate intervals set.
  This worklog will extend existing structures to track free GTIDs for
  separate TSIDs (SIDNOs).

  Moreover, the Certifier holds currently assigned interval for each GR member.
  This implementation will be extended
  to keep track of currently assigned interval of free GTIDs for each
  TSID (SIDNO) used by each GR member.

  Note: Currently in GR enabled replication, a separate error is issued
  in case given GTID has already been used - ER_GRP_RPL_GTID_ALREADY_USED.
  After this WL, if specified GTID is already taken, being tagged or
  untagged GTID, a ER_GRP_RPL_GTID_ALREADY_USED error is raised.

  Currently, the certifier tracks a set of GTID assignment blocks (controlled
  by group_replication_gtid_assignment_block_size) for the group UUID only.
  In this feature, we will track a set of GTID assignment blocks for each TSID
  where the UUID component is equal to the group UUID. This includes both the
  TSID with the group UUID and no tag, and any TSIDs with the group UUID
  and a tag.
* new feature in the COM_BINLOG_DUMP_GTID:
  Server will also provide a possibility connecting a new, tag-aware replica
  to an old, tag-unaware source. When replica connects to the source and detect
  that the source's version is lower than the version that introduced tags,
  the replica will exclude tagged GTIDs in the COM_BINLOG_DUMP_GTID.
* extension of existing server functionalities:
  Functionalities of the server are intended to be extended to take account
  for the tag, therefore this WL will also extend:
  - tracking of next free GTID, which will be done for each AUTOMATIC GTID,
    being tagged or untagged
  - checking replica GTIDs against source at connection time -  if the replica
    has more GTIDs with the source's UUID, with TSID being tagged or untagged,
    connection will be refused

## Limitations

L1. When using a fully specified GTID, being: ::,
the user is responsible for providing GTID that is unique in a given
replication topology.

USER INTERFACE
===============================================================================
    VARIABLE CHANGES:
    GTID_NEXT (new values)
    
      VALUES:
        : (already supported value)
        AUTOMATIC (already supported value)
        ANONYMOUS (already supported value)
        AUTOMATIC: (new value)
        :: (new value)

      not changed:   
        DEFAULT: AUTOMATIC
        
        SCOPE: SESSION
        
        REPLICATED: No
        
        DYNAMIC: Yes
        
        PERSIST: NOT PERSISTABLE
        
        COMMAND LINE: NOT

      changed:
        Tagged GTID : :: or AUTOMATIC:
          PRIVILEGES:
              TRANSACTION_GTID_TAG and at least one
              of the: 
              - SYSTEM_VARIABLES_ADMIN
              - SESSION_VARIABLES_ADMIN
              - REPLICATION_APPLIER

        Untagged GTID: :, AUTOMATIC, ANONYMOUS
          PRIVILEGES: SUPER, SYSTEM_VARIABLES_ADMIN, SESSION_VARIABLES_ADMIN or
            REPLICATION_APPLIER
            (no changes in privileges)
      
      DESCRIPTION (new values only):
        AUTOMATIC: The user provides a tag for the next
        transaction GTID. This GTID will be generated automatically. Server
        will generate source UUID, attach a TAG for the transaction and
        generate transaction sequence number, unique for the pair :TAG.

        :: The user provides a tag for the
        assigned, fully specified GTID. Server will use this GTID while
        committing the next transaction.

    group_replication_gtid_assignment_block_size
      This variable now affects not only the
      generation of GTIDs for the group UUID but also the generation
      for defined UUIDs when the user specifies GTID_NEXT=AUTOMATIC:
      GTID.

PERSISTENT STATE
===============================================================================

GTID tags are persisted everywhere GTID is persisted, which means:
- InnoDB redo log

  Before this worklog, textual representation of a Gtid used in InnoDB redo log,
  contained up to 56 characters. Gtid_info used in GTID descriptor reserved
  64 bytes to hold GTID information. This WL changes Gtid UUID encoding from
  text to binary. Reserved space remains
  the same - GTID uses up to 64 bytes to represent GTID UUID, GTID Tag length,
  GTID Tag and GNO number. This format will be given a version number 2.
  Reading functions are designed to be backward compatible with version 1.

- The gtid_executed table

  Current gtid_executed table definition is as follows:

  source_uuid CHAR(36) NOT NULL COMMENT 'uuid of the source where the
    transaction was originally executed.',
  interval_start BIGINT NOT NULL COMMENT 'First number of interval.',
  interval_end BIGINT NOT NULL COMMENT 'Last number of interval.',
  PRIMARY KEY(source_uuid, interval_start)

  New gtid_executed definition:

  source_uuid CHAR(36) NOT NULL COMMENT 'uuid of the source where the
    transaction was originally executed.',
  interval_start BIGINT NOT NULL COMMENT 'First number of interval.',
  interval_end BIGINT NOT NULL COMMENT 'Last number of interval.',
  gtid_tag CHAR(32) NOT NULL COMMENT 'GTID Tag.',
  PRIMARY KEY(source_uuid, gtid_tag, interval_start)

- Binary log

  Binary log contains new versions of serialized Gtid_log_event
  and updated format of events that contain GTIDs, details are contained in
  Protocol section of HLS)

OBSERVABILITY
===============================================================================

No changes w.r.t. existing functionality - GTID tags are presented to the user
whenever GTID of the transaction is presented, meaning:

- events present in the binary log (SHOW BINLOG/RELAYLOG EVENTS)
- SQL queries related to presenting the state of gtid_executed table. Since
  new column is added, SELECT queries that include 'gtid_tag' column will
  present new information about assigned GTID tag.
- GTID related data in performance_schema:
  - replication_connection_status:
    - LAST_QUEUED_TRANSACTION 57->90 bytes
    - QUEUEING_TRANSACTION 57->90 bytes
    - RECEIVED_TRANSACTION_SET no changes in column definition
  - replication_applier_status_by_coordinator: 
    - LAST_PROCESSED_TRANSACTION 57->90 bytes
    - PROCESSING_TRANSACTION  57->90 bytes
  - replication_applier_status_by_worker
    - APPLYING_TRANSACTION 57->90 bytes
    - LAST_APPLIED_TRANSACTION 57->90 bytes
  - replication_group_member_stats
    - LAST_CONFLICT_FREE_TRANSACTION no changes in column definition
  - binary_log_transaction_compression_stats
    - LAST_TRANSACTION_ID no changes in column definition
  * clone_status
    - GTID_EXECUTED no changes in column definition
  * events_transactions_current
    - GTID 64->90 bytes
  * events_transactions_history
    - GTID 64->90 bytes
  * events_transactions_history_long
    - GTID 64->90 bytes
  * status, session, global variables presented in performance schema
    (no changes)
- the following GTID functions accept GTID sets containing tags:
  GTID_SUBSET(), GTID_SUBTRACT(), 
  WAIT_FOR_EXECUTED_GTID_SET()
- SHOW statements: SHOW REPLICA/MASTER STATUS,
- output of the mysqlbinlog, --include-gtids and --exclude-gtids now accept
  tagged gtid sets
- Error messages that contain GTIDs and GTID sets may now contain tags.
  None of the existing messages specify the maximum number of characters
  to display GTID, therefore no format change is needed.
- Statements accepting GTID sets, namely START REPLICA
  {UNTIL_AFTER_GTIDS|UNTIL_BEFORE_GTIDS}. They now accept GTID sets
  containing tags.
- System variables other than @@gtid_next containing GTIDs or GTID sets,
  namely @@gtid_purged, @@gtid_executed, and @@gtid_owned.
  They may now contain GTIDs with tags, and SET @@gtid_purged accepts
  GTID sets containing tags. 

USER PROCEDURE
===============================================================================
  
The user is able to set GTID_NEXT for upcoming transaction by using the 
SET GTID_NEXT command. To be able to use this functionality, the user needs
to make sure that the source server is run in GTID mode. Automatic GTIDs
are assigned during the flush stage of Binlog group commit or at certification
time (GR case). Default value
for the GTID_NEXT is AUTOMATIC, meaning that server is automatically
assigning GTID based on the UUID of the source - source UUID becomes the
first component of the GTID.
To add a tag attached to transaction GTID, the user needs to execute
the command: SET GTID_NEXT=::
or SET GTID_NEXT=AUTOMATIC:
keeping in mind that the change will be observed with the next committed
transaction.

Valid GTID NEXT specifications:
::
AUTOMATIC:

Invalid GTID NEXT specifications:
::
AUTOMATIC:
  
SECURITY CONTEXT
===============================================================================

This WL introduces a new level of privileges (TRANSACTION_GTID_TAG)
under which the user may set a tag for the next transaction. Required
privileges to set GTID_NEXT with the tag are:
  - TRANSACTION_GTID_TAG and at least one of the:
    - SYSTEM_VARIABLES_ADMIN
    - SESSION_VARIABLES_ADMIN
    - REPLICATION_APPLIER

PRIVILEGE_CHECKS_USER account for the replication channel must have the
following privileges to set GTID_NEXT with a tag:
  TRANSACTION_GTID_TAG and REPLICATION_APPLIER (checked during start of the
  replication applier thread)

During the upgrade, every user with the BINLOG_ADMIN privilege will be
granted the TRANSACTION_GTID_TAG.
  
SYSTEM RESOURCES	
===============================================================================

No changes w.r.t. existing functionality
  
DEPLOYMENT and INSTALLATION
===============================================================================

No changes w.r.t. existing deployment/installation procedures.
  
PROTOCOL
===============================================================================

EXTERNAL
--------

No changes w.r.t. existing functionality

INTERNAL
--------

### Serialization framework

#### High-level description

This worklog introduces changes in Binary Log Events. At this point, Gtid event
is no further extensible and new version needs to be introduced to meet
requirements of a new representation of a GTID. In order to assure backward
and forward compatibility of newly defined events, serialization framework
is introduced. Serialization framework provides methods for automatic
serialization and deserialization of event fields defined by the API user.

Serialization framework is designed to expose a simple API
to a developer that facilitates event definition, serialization and
deserialization. The user instead of implementing serializing and deserializing
functions, specifies definitions of fields included in the packet.
Field definition considers definition of:
- A number of bytes used to represent the field
- A "field missing functor", specifying what decoder should do in case the
  field is not provided in the packet. By default, decoder won't take any
  action upon a missing field.
- A "field encode predicate", specifying whether or not serializer should
  include the field in the packet. Default behavior of the encoder is to
  always include the field in the packet.
- An "unknown field policy" - action that should be taken by decoder in case
  the field is encoded in the packet but its definition is unknown to the 
  decoder (considers decoders of version older than version of the software
  which introduced field into the packet). We define two policies: "ignore"
  and "error". The field definition chooses one policy.
  The framework encodes the chosen policy in the packet. An old server
  decoding the packet reads the policy from the packet: if the policy is
  "ignore", it just skips the field. If the policy is "error", it reports
  and error and rejects the packet.

The idea of the serialization framework is not to introduce many new
types that can be ingested by encoding and decoding functions, but
to reuse common STL types. Types supported by serialization framework
are:
  - signed/unsigned integers (simple type)
  - floating point numbers (simple type)
  - sets,
  - maps,
  - vectors,
  - arrays (simple type) of fixed size, which cannot evolve over time
  - enumerations (simple type)
  - strings
  - custom types - nested messages, called "serializable fields"

Important design decisions are listed below:
- as message definitions evolve over time, the framework allows inserting new
  fields
- fields get automatic type codes, assigned by the serialization framework and
  used to identify fields
- fields can be effectively removed from the packet by defining a field
  encode predicate which will always return a false value. This ensures
  that removed fields don't occupy space in the output packet, while
  maintaining the auto-generated type codes and field sizes for subsequent
  fields
- for simple types, field sizes are not
  included in the binary representation of a serialized type

#### Message format specification

Serialization framework types are encoded using the following
formats:
  - signed/unsigned integers -> fl_integer_format / vl_integer_format
  - floating point numbers -> sp_fp_number_format / dp_fp_number_format
  - sets -> container_format,
  - maps -> map_container_format,
  - vectors -> container_format,
  - arrays (simple type) of fixed size -> fixed_container_format
  - enumerations -> vl_integer_format
  - strings -> string_format
  - custom types - nested messages, called "serializable fields", encoded
    according to the "message"


Message formatting is the following:

 ::=  

top_level_message consists of:
- serialization_version_number - serialization framework version number >=0,
  1-9 bytes. Currently, the serializer writes 0 to this field. The deserializer
  stops with an error if this field has a value other than 0.
- message

 ::=   {  }

 ::=  

message consists of:
- total_field_size - Size of serializable field payload (1-9 bytes).
  Size information is used to calculate serializable type boundaries within
  a packet and also to skip unknown fields in the packet in case it is
  encoded by encoder of version newer than the version of decoder
- last_non_ignorable_field_id - (1-9 bytes). This is a last encoded,
  non-ignorable field id. In case this field id is unknown to current version
  of decoder, it will generate an error. In case all fields in the packet
  are ignorable, last non-ignorable field id will be equal to 0.
- field: Each "field" contains the metadata and the value for one field in
  the packet.

field consists of:
- auto-generated field_id >=0, sequence number which identifies fields
- fields formatted according to the field_data format

 ::=  |  |  |  |   |  |  | 

 ::=  | 

field_data can be one of:

- vl_integer_format - format used to encode signed/unsigned integers using
  1-9 bytes, depending on the integer value. Format is described in detail
  in the [Variable-length integers](#variable-length-integers).
- fl_integer_format - fixed-length integer, encoded using a fixed number of
  bytes, signed or unsigned
- fp_number_format - floating point number field:
  - sp_fp_number_format - single precision floating point number, using 4 bytes
  - dp_fp_number_format - double precision floating point number, using 8 bytes
- string_format - string field, see below
- container_format - unlimited-size container field, see below
- fixed_container_format - fixed-length container format, see below
- map_format - unlimited size container with key-value pairs, see below
- message - nested message, see below

Consecutive bytes of any number (integer, variable-lenght integer, floating
point numbers) are stored using the little-endian convention.

 ::=  {   }

Map format consists of:
- number_elements - number of elements in the container >=0, encoded in
  unsigned vl_integer_format 
- key-value pairs formatted according to field_data format

 ::=  {  }

Container format consists of:
- number_elements - number of elements in the container >=0, encoded in
  unsigned vl_integer_format 
- field_data - encoded values, according to field_data format

 ::= {  }+

Fixed-size array format consists of:
- encoded values, formatted according to field_data format

The number of elements is defined at compile-time and
cannot be changed when the message format evolves

 ::=  {  }

String format consists of:
- string_length - the number of 1 byte elements in the string, encoded in
  unsigned vl_integer_format 
- string characters, 1 byte unsigned integers

#### Variable-length integers

Variable-length integers are encoded using 1-9 bytes, depending on the
value of a particular field. Bytes are always stored using in LE byte order.

This format allows the decoder to determine the total number of bytes to read
after reading only the first byte. Therefore, the decoder will perform
a maximum of 2 reads to read variable-length integer.

The lowest bits in the first byte encode contain encoded information about
how many consecutive bytes represent the number. It is equal to one plus the
number of encoded trailing 1-bits before the first 0-bit. Therefore, it may
range from 1 to 9, inclusively.
When the number itself uses at most 56 bits, then the trailing ones are
followed by a bit equal to "0"; otherwise it is followed by the full number.
The special case for 57..64 bits allows us to use only 9 bytes to store numbers
where the 63'rd bit is "1". (An encoding without such a special case would
require 10 bytes.)
For readability, in text below, we display bytes in big-endian format.
Within each byte, we display the most significant bit first. Encoding
is explained reading bits from right to left.

- unsigned integers:
  Encoded length of the integer is followed by encoded value. Examples:

  "00000111 11111111 11111011" - BE byte order, bit layout:
    most significant bit first

    65535 is represented using 3 bytes. The rightmost byte contains two
    trailing ones followed by 0 (3 least significant bits - 3 bytes used to
    store the number). The latter bits are used to store the value.

- signed integers:
  Signed integers are encoded such that both positive and negative numbers
  use fewer bits the smaller their magnitude is. The least significant bit
  is the sign and the remaining bits represent the number. If x is positive,
  then we encode (x<<1), cast to unsigned. If x is negative, then we encode
  ((-(x + 1) << 1) | 1), cast to unsigned. That is, we first add 1 to shift
  the range from -2^63..-1 to -(2^63-1)..0; then we negate the result to get
  a nonnegative number in the range 0..2^63-1; then we shift the result left
  by 1 to make place for the sign bit; then we "or" with the sign bit.
  The resulting number is reinterpreted as unsigned and serialized accordingly.

  "00001111 11111111 11110011" - BE byte order, bit layout:
    starting from the most significant bit

    65535 is represented using 3 bytes. The rightmost byte contains two
    trailing ones followed by 0 (3 least significant bits - 3 bytes used to
    store the number). After 0, we encode one sign bit
    (equal to 0). The latter bits are used to store the value.

  "00001111 11111111 11101011" - BE byte order, bit layout:
    starting from the most significant bit

    -65535 is represented using 3 bytes. The rightmost byte contains two
    trailing ones followed by 0 (3 least significant bits - 3 bytes used to
    store the number). After 0, we encode one sign bit
    (equal to 1). The latter bits are used to store the negated value, minus 1.

  "00001111 11111111 11111011" - BE byte order, bit layout:
    starting from the most significant bit

    -65536 is represented using 3 bytes. The rightmost byte contains two
    trailing ones followed by 0 (3 least significant bits - 3 bytes used
    to store the number). After 0, we encode one sign bit
    (equal to 1). The latter bits are used to store the negated value, minus 1.

### Changes in Binary Log Events

#### Modification of the Gtid Event

At this point, Gtid event contains optional commit group ticket field
and it's extension is not further possible. The layout of the event is as
follows:

    +------------+
    |     1 byte | Flags
    +------------+
    |    16 bytes| Encoded SID
    +------------+
    |     8 bytes| Encoded GNO
    +------------+
    |     1 byte | lt_type
    +------------+
    |     8 bytes| last_committed
    +------------+
    |     8 bytes| sequence_number
    +------------+
    |  7/14 bytes| timestamps*
    +------------+
    |1 to 9 bytes| transaction_length (see net_length_size())
    +------------+
    |   4/8 bytes| original/immediate_server_version (see timestamps*)
    +------------+
    |     8 bytes| Commit group ticket
    +------------+ 

Tagged version of the Gtid_event will receive a new event typecode,
GTID_TAGGED_LOG_EVENT (42). Gtid_event will be serialized / deserialized
using a framework that enables further extension of the event. Each
of the existing fields will receive it's own type code assigned
automatically by serialization framework and layout will be
following:

    +------------+
    |  1-9 bytes | Serialization framework version (automatic)
    +------------+
    |  1-9 bytes | Size of below payload (automatic)
    +------------+
    |  1-9 bytes | Last non-ignorable field id (automatic), in case server does not know this field id, it will return an error. This field is equal to the commit group ticket typecode.
    +------------+
    |  1-9 bytes | Flags typecode
    +------------+
    |  1-9 bytes | Flags
    +------------+
    |  1-9 bytes | Encoded SID typecode
    +------------+
    |    16 bytes| Encoded SID
    +------------+
    |  1-9 bytes | Encoded GNO typecode
    +------------+
    |  1-9 bytes | Encoded GNO
    +------------+
    |  1-9 bytes | GTID tag typecode
    +------------+
    |  1-9 bytes | GTID tag length
    +------------+
    |  1-32 bytes| GTID tag
    +------------+
    |  1-9 bytes | last_committed typecode
    +------------+
    |  1-9 bytes | last_committed
    +------------+
    |  1-9 bytes | sequence_number typecode
    +------------+
    |  1-9 bytes | sequence_number
    +------------+
    |  1-9 bytes | immediate_commit_timestamp typecode
    +------------+
    |  1-9 bytes | immediate_commit_timestamp
    +------------+
    |  1-9 bytes | original_commit_timestamp typecode
    +------------+
    |  1-9 bytes | original_commit_timestamp
    +------------+
    |  1-9 bytes | transaction_length typecode
    +------------+
    |  1-9 bytes | transaction_length
    +------------+
    |  1-9 bytes | immediate_server_version typecode
    +------------+
    |  1-9 bytes | immediate_server_version
    +------------+
    |  1-9 bytes | original_server_version typecode
    +------------+
    |  1-9 bytes | original_server_version
    +------------+
    |  1-9 bytes | Commit group ticket typecode
    +------------+
    |  1-9 bytes | Commit group ticket
    +------------+ 

First three fields is automatic "inner packet" overhead used to
ensure forward/backward compatibility with future versions of the
event. Packet integer fields are saved as variable-length integers,
which length will be 1-9 bytes depending on the value of the field, as
explained in the "Serialization framework" subsection. Since this is the first
decoder version that uses serialization framework, decoder knows all of the
fields ids - all packet fields are defined as ignorable, information won't
be used. Field typecodes are also represented using the variable-length
integers. Since we have a dozen fields in the packet, all typecodes will
be encoded using 1 byte.

#### New GTID set encoding, binary format:

There are two formats of GTID set encoding:
- untagged GTIDs format, used when GTID set contains only untagged GTIDs
- tagged GTIDs format, used when GTID set contains tagged GTIDs

Type code 0 (8 bit integer):
  - value 1: tagged or untagged GTIDS format
  - value 0: untagged GTIDs format
Number of TSIDs (56 for untagged GTIDs format, 48 bit integer for tagged GTIDs format, little endian)
Type code 1 (8 bit integer):
  - value 0: untagged GTIDs format,
  - value 1: tagged GTIDs format
for each TSID:
    UUID (16 bytes)
    length of the tag (8 bits)
    tag definition (length of the tag number of bytes)
    number of intervals (64 bits)
      for each interval:
        interval start (64 bits)
        interval end (64 bits)

GTIDs are encoded in the following places:
 - Previous_gtids_log_event
 - Transaction_context_log_event
 - View_change_log_event
 - The packet broadcast in a GR group which contains gtid_executed

#### New GTID encoding, text format:

Text format of a GTID is the following:

:

Tagged format of a GTID is used whenever GTID contains a tag:

[:]:

being TAG a text that matches the following regular expression:

[a-z_][a-z0-9_]{0,31}

#### New GTID set encoding, text format:

gtid_set:
    uuid_set [, uuid_set] ...
    | ''

    Within a gtid_set, uuids appear in alphabetical order

uuid_set:
    uuid[:interval_list][:tag_intervals][:tag_intervals]...

    Within a uuid_set, tags appear in alphabetical order

interval_list:
    interval[:interval]...
 
    Within an interval_list, intervals appear in increasing order and there
    is always a gap between two adjacent intervals

tag_intervals:
    tag:interval_list

uuid:
    hhhhhhhh-hhhh-hhhh-hhhh-hhhhhhhhhhhh

h:
    [0-9|a-f]

tag:
    [a-z_][a-z0-9_]{0,31}

interval:
    n[-n]
    (n >= 1)

Presented definition of the gtid_set ensures that text representation of the
gtid_set is unique, therefore sets are equal only if their string
representations are equal.

When the system is ingesting gtid_set in text format prepared by the user:
-  Within a gtid_set, uuids may be repeated, may appear in any order, and are
   case-insensitive.
-  Within a uuid_set, tags may be repeated, may appear in any order. Also,
   tags are case-insensitive.
-  Within an interval_list, intervals may overlap, repeat and appear in any
   order. Also, interval_list may be empty.

### COM_ BINLOG_DUMP_GTID

COM_BINLOG_DUMP_GTID uses a new GTID set encoding described in the 
"New GTID set encoding, binary format" section above. In case version
of the source is lower than the version that introduce tags,
replica will exclude tagged GTIDs from the packet sent to the source.

### Changes in GR communication

#### Modification of the Transaction_prepared_message

The current layout of this message is the following:

  +------------+
  |     1 byte | GNO typecode (PIT_TRANSACTION_PREPARED_GNO)
  +------------+
  |     8 bytes| GNO
  +------------+
  |     1 byte | SID typecode (PIT_TRANSACTION_PREPARED_SID) optional
  +------------+
  |     16 byte| SID (encoded) optional
  +------------+

New format of Transaction_prepared_message:

  +------------+
  |     1 byte | GNO typecode (PIT_TRANSACTION_PREPARED_GNO)
  +------------+
  |     8 bytes| GNO
  +------------+
  |     1 byte | SID typecode (PIT_TRANSACTION_PREPARED_SID) optional
  +------------+
  |     16 byte| SID (encoded) optional
  +------------+
  |     1 byte | Tag typecode (PIT_TRANSACTION_PREPARED_TAG) optional
  +------------+
  |     1 byte | Tag length optional (value 1-32, encoded using 1 byte)
  +------------+
  | tag length | Tag optional
  +------------+


FAILURE MODEL SPECIFICATION
===============================================================================

Failure model specification remains unchanged. New error messages are
introduced and listed in below section.

## NEW ERRORS

  1. In case the GTID MODE is set to OFF or OFF_PERMISSIVE, the server shall
     disallow setting the GTID_NEXT to AUTOMATIC:TAG and issue the
     ER_CANT_SET_GTID_NEXT_TO_AUTOMATIC_TAGGED_WHEN_GTID_MODE_IS_OFF:
      eng "@@SESSION.GTID_NEXT cannot be set to AUTOMATIC: when
           @@GLOBAL.GTID_MODE = OFF or OFF_PERMISSIVE"

  2. Serialization framework uses internal error codes. Currently, in case
     serialization or deserialization fails, software will generate
     a context-dependent error. Internal Binlog_read_error is extended to
     handle event formats that come from future server versions. Internal
     message appended to the context-dependent error will be the following:

     "Unrecognized event format. The event appears to originate from a future
     server version"

  3. In case GR failed to decode a tag in the Transaction_prepared_message,
     Group replication will report the ER_GRP_RPL_MSG_DECODING_FAILED:

     eng "Failed to decode Group Replication message:
     Transaction_prepared_message. Reason : Failed to decode a tag,
     wrong format"

     and will sent an error packet to the applier with
     ER_GRP_RPL_APPLIER_ERROR_PACKET_RECEIVED:

     eng "The Applier process of Group Replication found an error and was
     requested to stop: Failure when processing a received Transaction package
     from the communication layer"

  4. In case the user has not sufficient privilege to execute / apply
     SET GTID_NEXT with a tag statement,
     the server will generate:

     ER_SPECIFIC_ACCESS_DENIED
       eng "Access denied; you need %-.256s privilege(s) for this operation"

    with the following error message:

      eng "Access denied; you need the TRANSACTION_GTID_TAG and at least one
      of the: SYSTEM_VARIABLES_ADMIN, SESSION_VARIABLES_ADMIN
      REPLICATION_APPLIER privilege(s) for this operation"

UPGRADE/DOWNGRADE and CROSS-VERSION REPLICATION
===============================================================================

In async replication, it is supported for a new replica to connect to an
old source. If a new replica has committed tag transactions, and needs to
connect to an old source that is tag-unaware, the COM_BINLOG_DUMP_GTID packet
that the new replica sends to the old source has to be compatible with the new
source. Therefore, the new replica has to check the server version of the
source; if it is from an old version that is tag-unaware, the replica should
exclude tagged GTIDs from the GTID set.

In async replication, having a source server that is newer than the replica
is unsupported. This is not enforced by any fence; it is possible for an old
replica to connect to a new source. In this case, the source will attempt to
replicate to the new replica. If the new source commits tag-transactions, an
old, tag-unaware replica will observe that as an unknown and non-ignorable log
event, the new Gtid_log_event. The replica then stops with an error. The user
can recover from this situation by upgrading the replica and starting
replication again.

When running a replication topology with Group Replication enabled,
GTID tags may be used only when all servers in replication topology have the
version equal or greater on which GTID tags were introduced.

During the upgrade, every user with the BINLOG_ADMIN privilege will be
granted the TRANSACTION_GTID_TAG.

During the upgrade, internal gtid_executed table will be extended by
additional gtid_tag column, which will be a second component of the
table primary key.

BEHAVIOR CHANGE
===============================================================================

No changes w.r.t. existing functionality

SUMMARY OF THE APPROACH
===============================================================================

Note for the reader:

Although all SW components are using the same GTID interface consistently in
the code, the GTID generation algorithms are different for GR and asynchronous
replication. Therefore, algorithms in this worklog are separately defined for
"GR case" and "asynchronous replication case". 

This description contains the summary of the approach. More details can be
found in below sections of this LLD.

The scope of this worklog is changing implementation of the GTID
representation in the system. Introduced are optional GTID tags,
which may be assigned by the user with specific privileges level.
Changes include: adjusting implementation of the GTID (adding Tag component),
changes in GTID generation at commit time in asynchronous replication, changes
in GTID generation at certification time in GR, GTID reading and writing,
coding and encoding of GTID related events. This worklog also requires changes
in all of the places that propagate or present information about transaction
GTID, which include amongst others: GTID related events, GTID related
system messages, mysqlbinlog client, InnoDB redo log and commands presenting
information about current system state that include information about
transactions GTIDs.

Steps of this worklog are the following:

Step 1. Forward and backward compatible serialization framework
Step 2. Changing representation of the GTID
Step 3. Changes in GTID handling/generation in the Binlog Group Commit
Step 4. Changes in GTID handling/generation in the GR plugin
Step 5. Protect execution of GTID_NEXT 'tagged' with privileges
Step 6. Allow GTID_NEXT with a tag to run under certain GTID modes. Allow
        changing GTID mode to compatible mode in case sessions with tag
        assigned are running.
Step 7. Rename remaining '*sid*' related type names, function names and variables into '*tsid*'
Step 8. Keep track of the next_free_gno for AUTOMATIC tagged GTIDs
Step 9. Extend checking replica GTIDs during connection to the source

## Step 1. Forward and backward compatible serialization framework

This step is focused on providing a simple user API for defining
serializable class that can change in the future. Background for this step
are Gtid_events, which cannot be extended at this point.
Gtid_event class will receive additional
optional field related to GTID tag, but this time serialization and
deserialization of the event will be forward and backward compatible.

Serialization framework will provide 'Serializable' interface that needs
to be implemented by the API user. The API user needs to define two methods,
a non-const 'define_fields' object method used during deserialization
and const 'define fields' object method used during serialization. These
methods will return definition of fields with references to internal class
fields. After this step, the API user may choose defined serializer type
and call 'serialize' or 'deserialize' method to obtain byte representation
of the event, without the need of specifying how specific fields are
saved into the memory.

After extension of the existing serializable type, the API user may decide what
will happen in case older version of the software receives unknown fields.
Default behavior is "ignore", the API user may change this behavior to "generate
an error". In that case, older server will generate an error and
behave in crash-stop manner.

## Step 2. Changing representation of the GTID

This step include all of the changes needed in internal representation of
the GTID and processing of the new, tagged GTID type, which will be assumed
to remain empty at GTID generation time (in this step).

###  Introduction

GTID functionalities are spread across several classes
placed in the: server libraries, binlog events library and GR plugin
related libraries. They are already containing functions dedicated to:
- GTID parsing ("from string"/"to string")
- Accessing the GTID data. UUID is represented either by string or a unique
number associated to the UUID string generated by the current mysqld process,
"rpl_sidno". GTID number, GNO is usually represented by the "rpl_gno" integer.
- GTID encoding / decoding functions (saving/loading GTID related data in
binary format).

GTID generation functions are defined separately, as external functionality
related to the GTID. However, there is no dedicated class for GTID generation.

### Planned changes to GTID interface

#### GTID definition

Definition of the UUID is extended to hold also information about defined
tag (which may be empty). TSID is mapped into the SIDNO, therefore,
transaction with the same source UUID but different tag will be mapped to
a separate SIDNO.

Before of this WL, parsing of the GTID was implemented in one function, 
parsing both, the UUID and the GNO. Now, it will contain three functions
related to:
* parsing of the UUID
* parsing of the transaction tag
* parsing of the GNO

#### GTID specification changes and GTID type changes

GTID specification parsing functions need to account for the new functionality,
meaning they need to allow the user to pass defined tag to system to be
processed and changed into TSID.

### Integration with server code

All the places that utilize information about GTID UUID, will now be changed
to utilize information about GTID TSID.

### Gtid persistent state

Changes in GTID persistent state consider:
- changes in GTID executed table
- changes in InnoDB Redo log
- changes in binlog events

## Step 3. Changes in GTID handling/generation in the Binlog Group Commit

GTID is assigned automatically during the Binlog Group Commit flush stage. This
function acquires locks associated with the global SID map object that is 
defined in the current mysql process and UUID related SIDNO. UUID related
SIDNO lock needs to be held while calling GTID generation code. In case
the user specifies the full GTID for the next transaction, lock related to
the defined SIDNO is taken and unlocked in the GTID generation code.
After this WL, GTID generation code needs to account for the fact that
UUID related lock might not be the only one SIDNO lock that is taken for
the "longer time". Extended locking mechanism is implemented in order to
eliminate constant SIDNO lock locking and unlocking in case more
transactions are using the same SIDNO lock (typical case for the
AUTOMATIC_GTID, but also newly introduced tagged AUTOMATIC_GTID). It
should be mentioned here, that "extended locking" mechanism was used
only for the UUID related SIDNO lock. Now, we may have several SIDNO
locks, but we want to keep the same optimization to reduce the
number of calls to lock/unlock functions.

Therefore, we introduce additional class, called Locked_sidno_set, that 
will be used to lock not-already-owned SIDNO locks and unlock them
automatically after the binlog flush stage is over.

## Step 4. Changes in GTID handling/generation in the GR plugin

GTID is transmitted before the transactions executes the generation of the
GTID in the Binlog Group Commit flush stage. Therefore, GR contain it's own
code for handling generation of the GTID. Until now the GR certifies either
automatically generated GTIDs using the group UUID (group SIDNO) or a given
GTID was assigned to the received transaction. Existing certifier functions
need to be changed to account for the GTID generated with a tag
specified by the user, which means handling of various transaction SIDNOs
(that correspond to different TSIDs)

Changes in GR consider:

### Changes in Certifier functions:

Before this WL, Certifier kept track of:
- GTID free intervals for the group UUID
- currently "allocated" GTID interval used for each member.

Certifier allocated GTID intervals in blocks, with specified block size.
After introduction of this WL, functionality is extended to handle
not only currently used group SIDNO, but all TSIDs
which correspond to different SIDNOs.

In case the GR view changes, certifier is updating the list of available
GTID intervals for VCLE SIDNO. Allocated blocks are cleared out, and 
allocated again after on-demand (first transaction that uses defined by the
user UUIDs). In case the current GTID interval is exhausted, a new block is
allocated for the given TSID (SIDNO) and member from which transaction
originates.

All of the used structures are changed according to the planned design,
including functions that call the functions on object of associated types.
More information in the "Detailed implementation" section.

### Changes in certification handler:

Certification handler need to utilize additional information about GTID
to be generated. Certification handler used "is_specified"
value of the Transaction context log event. After introduction of this
worklog, certification handler will also need to check whether Gtid_log_event
contains information about defined tag and propagate it to 'certify' function.

## Step 5. Protect execution of GTID_NEXT 'tagged' with privileges

This step implements new privilege: TRANSACTION_GTID_TAG.
SET_GTID_NEXT=::NUMBER and SET GTID_NEXT=AUTOMATIC:
is able to run under privileges specified in the "USER INTERFACE" section of
the HLS.

PRIVILEGE_CHECKS_USER account for the replication channel must have the
following privileges to set GTID_NEXT with a tag:
  TRANSACTION_GTID_TAG and REPLICATION_APPLIER (checked during start of the
  replication applier thread)

## Step 6. Allow GTID_NEXT with a tag to run under certain GTID modes. Allow changing GTID mode to compatible mode in case sessions with tag assigned are running.

This step implements:

- Allowing GTID_NEXT with a tag to be set under allowed GTID modes specified
- Allowing changing GTID mode to compatible mode in case there are sessions with GTID_NEXT set to AUTOMATIC:

## Step 7. Rename remaining '*sid*' related type names, function names and variables into '*tsid*'

This step performs renaming of type names, function names and variables
containing "sid" and using TSID definitions into "tsid".

## Step 8. Keep track of the next_free_gno for AUTOMATIC tagged GTIDs

This step extends GTID assigned optimization used in case UUID of the
transaction is a server UUID. The 'next_free_gno' variable is
extended to an unordered map, which tracks next free transaction
numbers for multiple SIDNOs

## Step 9. Extend checking replica GTIDs during connection to the source

At connection time, the source checks that the replica
doesn't have GTIDs with the source's UUID, which the
source does not have. This step extends the check to
cover also tagged GTIDs with the source's UUID

DETAILED IMPLEMENTATION
===============================================================================

## Step 1. Forward and backward compatible serialization framework

### Serialization framework entities

Serialization framework defines the following entities:

- Serializable - base class for data types that are capable of being
  automatically coded / decoded using serialization framework. To define
  a message format, the user needs to inherit a custom structure from
  Serializable class and implement "define fields" function. Available are
  the following helpers:
  - define_field - enabled for definition of integer, floating point fields,
    enumerations, container types
  - define_field_with_size - enabled for integer fields, allows for definition
    of fixed number of bytes used to represent a particular integer field
  - define_compound_field - for definition of nested messages
- Serializer - base class for serializers, define what information is
  serialized (value, typecode, size...). Serializer does not specify the
  byte layout of the encoded basic types, but specifies the format of
  supported container types:
  - map / unordered map
  - set / unordered set
  - vector
  - array
  and enumeration types.
  Serializer aggregates an archive and call its functions to format
  simple types.
- Archive - base class for archives, defines how information is serialized -
  defines format and final byte layout of the basic types:
  - variable length integers
  - fixed-length integers
  - floating point numbers
  - string
  Archives decides how specific basic types are encoded, meaning format (string / binary)
  and where specific encoded information is saved to or loaded from, e.g.
  vector of bytes, string stream, externally allocated memory, file...

### Implementation specifics

Serialization framework is capable of ingesting the following types:
- simple types:
  - (u)int8_t,
  - (u)int16_t,
  - (u)int32_t,
  - (u)int64_t,
  - std::string,
  - float,
  - double
- STL containers:
  - vector
  - map / unordered map
  - set / unordered set
  - array / constant C array (fixed size, which cannot evolve over versions)
- strongly typed enumerations
- types that implement "Serializable" base class interface,
  called "serializable fields"

By default, integer fields are encoded using a variable-length integers format.
The user may change format of the integer by using the "define_field_with_size"
helper, which ingests the fixed number of bytes. The same helper may be used
to specify the maximum number of bytes for the string field. Supported
format definitions are described in the "Message format specification" section
of the HLS.

As mentioned before, the user may call the following helpers in the definition
of the "define_fields" method of the concrete Serializable class:
- define_field - this function ingests:
  - Field reference
  - Definition of the encode predicate. Optional argument, by default it
    is equal to function which always returns true. A concept of the field
    encode predicate is explained in the "Serialization framework. High level
    description" section of the HLS.
  - Definition of the missing field functor. Optional argument, by default it
    is equal to an empty function). A concept of the field
    missing functor is explained in the "Serialization framework. High level
    description" section of the HLS.
  - Definition of the "unknown field policy". Optional argument, by default it
    is equal to "ignore". A concept of the unknown field policy
    is explained in the "Serialization framework. High level
    description" section of the HLS.
- define_field_with_size - this function ingests the fixed size of integer type
  or may be used to define the maximum size of the string. It is enabled
  only for string / integer types. Other function arguments are the same
  as arguments of define_field helper. Defining field with size equal to "0" 
  means that default size will be used to encode a specific field
- define_compound_field - this function ingests only the reference of
  the serializable field.

To define a serializable field, the user needs to inherit its own type from
the "Serializable" base class and define the following functions:
- define_fields, non-const, used during decoding
- define_fields, const, used during encoding

The user is able to calculate the size of the output packet by calling
the "get_size" function of the concrete serializer type with
serializable field passed as the function argument. Output of this function
depends on field values, therefore this function should be called only when
all of the packet fields have values assigned.

The user may also calculate the maximum size of the packet at compile time,
by calling the "get_max_size" function of the concrete serializer type.
Function call will cause compilation to fail if any of the fields has
unlimited size (vector/unbounded string/map/set).

### Extensibility of the serialization framework

The majority of serialization framework is focused on compile time
type deduction. Serialization framework may encode / decode data by itself
or, with further extension, use Protocol Buffers as backend - additional
'Serializer' specialization that will convert simple/STL containers enclosed in
MySQL data structures into fields in classes generated by Protocol Buffers.

Serializable framework includes only one Serializer specialization,
Serializer_default. This serializer saves type id and field value for
each simple field. For each field of type that is implementing "serializable"
interface, it saves type id, serializable size
and calls encoding function. For each variable length field, string field
or field of type that is one of the supported container types,
it saves the number of elements. As mentioned before, other types
of serializers may be developed to serve different purposes.

Serializable framework contains three implementations of an archive,
Read_archive_binary, used in case we only deserialize data, Archive_binary,
which stores encoded bytes internally in vector, and Archive_text
used for debugging purposes (contains text representation of data serialized
by Serializer).

Framework can be further extended to support various types of serializers
and archives. User API is not intended to be changed, the API user is 
always responsible for providing fields definitions (see User API below).

#### Message evolution - backward and forward compatibility of encoded messages

Considering message evolution over time, the user defines whether field is
mandatory or optional for the current version of encoder by specifying the
field encode predicate and for older versions of decoder by specifying behavior
in case included field definition is unknown to older version of decoder.
Let's consider the following message definition in the server version x:

integer field_a

which changes in the server version y to the following definition:

integer field_a,
integer field_b.

In case field_b is required for processing in the current version of the
software, the user should specify the field_b encoding predicate which will
tell encoder to always include field_b in the packet. The user needs to
think what happens in case:

- server with version_y receives message from server with version_x

  Field_b is missing in the packet. The user specifies actions that
  need to be taken by the server (current version) in the field missing
  functor. If this version does not know how to proceed without field_b and
  does not have a way of providing backward compatibility, which means that
  the user decides on breaking backward compatibility on purpose (unlike case),
  the user will implement the field missing functor that will error out
  the server

- server with version_x receives message from server with version_y

  In this case, server with version_x receives field_b which definition is
  unknown. The user specifies an action in version_y that should be taken by
  server version x in case it receives field_b. Action is limited to:
  - generate error
  - ignore field and proceed with message decoding

- server with version_y receives message from server with version_z,
  which is higher than version_y and field_b is missing:

  In that case, server of version y will run a field missing functor.
  Encoding predicate changed in version_z and encoder with version_z
  chose to skip field_b while encoding the packet.

### User API example

An example of field definition method used during deserialization
(optional fields specify what happens in case field is not provided by
definition of the 'decode predicate'):

        decltype(auto) define_fields() {
          return std::make_tuple(
            define_field(gtid_flags),
            define_field(Uuid_parent_struct.bytes),
            define_field(gtid_info_struct.rpl_gtid_gno),
            define_field(last_committed),
            define_field(sequence_number),
            define_field(immediate_commit_timestamp),
            define_field(original_commit_timestamp,
              Field_missing_functor([this]() -> auto {
                this->original_commit_timestamp =
                  this->immediate_commit_timestamp;
              })),
            define_field(transaction_length),
            define_field(immediate_server_version),
            define_field(original_server_version,
              Field_missing_functor([this]() -> auto {
                this->original_server_version = this->immediate_server_version;
              })),
            define_field(commit_group_ticket,
              Field_missing_functor([this]() -> auto {
                this->commit_group_ticket = Gtid_event::kGroupTicketUnset;
              }), Unknown_field_policy::error)
          );
        }

Deserialization of defined fields inside of Gtid_event decoding method:

        Serializer_default serializer;
        serializer >> *this;

Serialization of defined fields inside of the Gtid_log_event encoding method:
        serializer << *this;

### Generic functionality related to serialization/deserialization of strongly typed enumerations

Additional helper functions are introduced to simplify explicit conversion
of strongly typed enumerations and their defined underlying types.
Since C++23 defines "to_underlying", this function is temporarily
implemented in the MySQL in std namespace in case associated macro
implemented in the STL is not defined. After switching to C++23, STL
implementation of the function
will be used automatically. Conversion performed in the opposite direction
is not that straightforward. Code need to handle the case when defined
value is not appropriate state of the enumeration type. In that case,
a special, "invalid" constant of the enumeration type is returned and
handling of this special situation needs to be done by the caller.

## Step 2. Changing representation of the GTID

Newly introduced types:

- Trx_tag (implementation of the transaction tag)
- Trx_source_id (implementation of the TSID)

After this WL, Gtid class will include information of the TSID instead of UUID.
All of the code that uses UUID will be changed to process information
about GTID TSID.

The user will assign transaction GTID by execution of the SET GTID_NEXT.
Information about GTID tag will be propagated until GTID is assigned to the
transaction and transaction will receive a tagged GTID, meaning,
in the Binlog Group Commit or sent in the Gtid_event and applied
at certification time.
In this step, information about GTID tag will
be ignored in the system (integration will take place in Step 3 and 4).

Newly defined constants (static constant expressions):
- gtid_separator (defined to the currently used separator ':')
- gtid_end (GTID definition in the string type ends with the end of the string,
thus it is defined to the '\0')

Gtid parsing functions:
- currently implemented "parse function" uses newly defined parse_gno_str,
parse_sid_str and parse_tag_str to separately parse GNO, SID and Tag of the
GTID. It adds generated TSID into SID map in case it is passed as
a function parameter. Please note that one TSID is mapped into one
corresponding SIDNO. Since parsing functionality is duplicated in the current
implementation (to report GTID parsing error or not, in case we only want
to find out if GTID is correctly defined), the current "parse" function is
ingesting additional boolean variable. This variable decides whether GTID
parsing error will be reported or not. Duplicated implementation contained
in the "is_valid" function) is removed.

Note about GTID generation algorithm:

No changes here, already specified TSID (SIDNO) is passed to the 'generate_gtid'
function and only the GNO is generated.

### Gtid persistent state

Changes in GTID persistent state are implemented according to description
in "Persistent State" section of the HLS.

### Changes in Gtid related events include:

- changing of the Gtid_event and Gtid_log_event
  New GTID event type will receive new type code: GTID_TAGGED_LOG_EVENT (42).
  'Gtid_event' class will keep old code to encode / decode GTID_LOG_EVENT and
  define new code to encode and decode extensible GTID_TAGGED_LOG_EVENT
  with an usage of serialization framework (step 1)
- changing of the Gtid set encoding / decoding
  GTID set encoding/decoding will be changed to
  include information about GTID tag, but will be backward compatible. In case
  tag is not provided, representation will be the same as it was before
  introduction of this WL.
- changing GR messages that propagate information about transaction GTID
  (introduction of the optional GTID tag fields)


### COM_BINLOG_DUMP_GTID

COM_BINLOG_DUMP_GTID encodes GTID set according to the format described in
"Protocol" section of the HLS.

## Step 3. Changes in GTID handling/generation in the Binlog Group Commit

GTID is assigned automatically during the Binlog Group Commit flush stage,
in the 'assign_automatic_gtids_to_flush_group' function. In case any of the
transaction has GTID type specified as "AUTOMATIC", the global SID lock
protecting the global SID map is taken. This functionality remains unchanged,
however, existing lines of code are executed for tagged and untagged automatic
GTIDs.
As optimization of the SID locking/unlocking mechanism, function kept
track of currently locked SIDNO (to lock server UUID sidno for all
transactions executing the current flush stage).
Right now, we might have larger number of SID locks, not only the one
associated with the source server UUID. Therefore, a newly introduced class
is used, called Locked_sidno_set. This is a simple RAII class that ingests
the current transaction UUID and takes the lock if not already taken. In the
class destructor, it releases all SID locks held. This way,
generate_automatic_gtid is greatly simplified. Branches related to
handling "generate_gtid" function argument, "locked_sidno" pointer that may
be null pointer or not, are removed. Instead, "generate_gtid" function is
ingesting the locked_sidno_set passed by the caller and uses
acquire_unowned_lock function of the "locked_sidno_set" object to acquire locks
for incoming SIDNOs that are not already taken. Both "generate_gtid" and
"assign_automatic_gtids_to_flush_group" does not care for unlocking procedure,
because it is being held automatically in the Lock_sidno_set class during the
stack unwinding (destructor).

## Step 4. Changes in GTID handling/generation in the GR plugin

GTID generation / handling functions are placed in the Certifier class. This
functionality is now placed in Gtid_generator and used
in the Certifier.

### Changes in certifier functions:

- Gtid_generator class will ingest a transaction sidno and will keep a map of
  the following key-value pairs:
  sindo and associated Gtid_generator_for_sidno class object. Internal
  Gtid_generator_for_sidno members are:

  - m_sidno - Sidno for which generator will produce a transaction number
  - compute_group_available_gtid_intervals - calculates intervals
    available in the group separately for the defined sidno
  - reserve_gtid_block - reserves the GTID block for the defined
    sidno
  - m_available_intervals (previously group_available_gtid_intervals). This 
    member now holds free intervals for the defined sidno
  - m_assigned_intervals (previously member_gtids). Maps group member UUID to
    currently assigned interval. Map is defined for the defined sidno.

  Gtid_generator class members are:

  - m_gtid_assignment_block_size - GTID blocks generated will have this size
  - m_gtid_generator_for_sidno - holds generator object for each defined sidno
  - get_next_available_gtid - generates GTID for the specific member_uuid and
    sidno
  - recompute - recalculates each sidno generator state, reassigns intervals
  - initialize - initialization function

- changes in the 'certify' function :

* decisions that certifier may take are implemented in the newly introduced
Certification_result enumeration type. Possible states:
a) positive - transaction is certified positively
b) negative - transaction is certified negatively, but there was no error,
GR may proceed
c) error - transaction was certified negatively, an error occurred

- changes in the GNO generation functions

  Return value change:

  Functions that are responsible for generation of the GNO have a hidden
  control flow encoded in the possible values of generated GNO. Before
  this WL, the function might return:
  * value greater than 0 - generated GNO
  * value 0 (some of the functions only) - GNO was not generated
  * value -1 gno numbering exhausted for the given UUID (group UUID), it means
    that generation of GTID is no longer possible - fatal error, stop
  * value -2 (some of the functions only) - GNO free values exhaustion for 
    currently used interval. This state indicates that a new interval
    should be assigned for the current member of the group.

  This control flow with GNO values is replaced with newly introduced
  enumeration type, Gno_generation_result with possible states:
  * ok - successful generation
  * gno_exhausted - no free GNO number, error
  * gno_overflow - generated GNO is out of the scope of the current interval

  Changes in function parameters:

  GNO generation functions ingest additionally the rpl_sidno, that represents
  the specified UUID (may be group UUID). The available GTID intervals are
  allocated / deallocated with the same scheme as for AUTOMATIC_GTID using
  the group UUID or rather the server representation of the UUID. If generated
  GNO is out of the scope of the defined interval, function assigns the new
  free GTID GNO interval for the given UUID. In case GR view changes,
  free intervals are re-calculated from the GTID executed set.

### Additional refactorings: 

  Functionalities, like creation of the Transaction Termination Context 
  are delegated to separate Certifier helper functions.

  Certification code is changed to use RAII MUTEX_LOCK macro defined in the
  MySQL.


## Step 5. Protect execution of GTID_NEXT 'tagged' with privileges

This step implements new privilege: TRANSACTION_GTID_TAG.
SET_GTID_NEXT=::NUMBER and SET GTID_NEXT=AUTOMATIC:
is able to run under privileges specified in the "USER INTERFACE" section of
the HLS.

## Step 6. Allow GTID_NEXT with a tag to run under certain GTID modes. Allow changing GTID mode to compatible mode in case sessions with tag assigned are running.

This step implements:

### Allowing GTID_NEXT with a tag to be set under allowed GTID modes specified

When set_gtid_next function is executed, current mode of GTID is checked and
if it is not compatible with the value of GTID_NEXT, an appropriate error
is shown. Errors are specified in the "FAILURE MODEL SPECIFICATION" section
of the HLS.

### Allowing changing GTID mode to compatible mode in case there are sessions with GTID_NEXT set to AUTOMATIC:

When 'global_update' function is executed and the new mode is OFF_PERMISSIVE,
the number of running sessions with GTID_NEXT set to 'AUTOMATIC:' is
checked. If the number is zero, setting to OFF_PERMISSIVE is allowed. Otherwise,
command fails with an error.

Counter for the sessions running with GTID_NEXT set to 'AUTOMATIC:' is
implemented inside of the 'Gtid_state' class. Sessions increment or decrement
this counter in the 'set_gtid_next' function in case GTID_NEXT changed
from 'untagged' to 'tagged' or from 'tagged' to 'untagged'. Counter is also
decremented in case session is destroyed and GTID_NEXT was set to AUTOMATIC
with a tag.

## Step 7. Rename remaining '*sid*' related type names, function names and variables into '*tsid*'

This step performs renaming of type names, function names and variables
containing "sid" and using TSID definitions into "tsid", e.g.
Sid_map -> Tsid_map

## Step 8. Keep track of the next_free_gno for AUTOMATIC tagged GTIDs

This step changes next_free_gno from a integer variable into an
unordered map which tracks next_free_gno for each registered sidno.

## Step 9. Extend checking replica GTIDs during connection to the source

This step implements is_subset_for_sid function in the Gtid_set.
This function will check if this gtid set is a subset of the given gtid_set
on the given superset sidno and subset sidno.