WL#6631: Detect transaction boundaries

Affects: Server-5.7 — Status: Complete

Description
Requirements
Dependent Tasks
High Level Architecture
Low Level Design

NARRATIVE
---------

Add server flag to detect that a new transaction started.

DESCRIPTION
-----------

In a load-balanced setup, it is necessary to know when a statement resulted in
the start of a new transaction, which would allow connectors to switch to using
a different connection from the connection pool.

The critical case to detect is when the transaction is "fresh" and does not have
any reads or writes attached to it yet. This is the only case where a connector
can switch to using a different connection. If a statement starts a new 
transaction and starts adding reads or writes to it, it is not possible to move
the connection since it would imply a rollback.

To handle this case, a new server status item could be added to the response
packet to indicate when "work" has been added to a transaction.

The existing SERVER_STATUS_IN_TRANS flag provides baseline functionality to this
end, letting as detect when a transaction is active. This is sufficient
information to avoid moving a connection while a transaction is ongoing.
It does not however let us detect edges, so a session of the form
BEGIN ... COMMIT AND CHAIN ... COMMIT AND CHAIN ... COMMIT AND RELEASE
would have the flag set for its entire lifetime, and consequently, the
load-balancer would be unable to re-locate it.

Additional information indicating that a transaction may be active, but no
reads or writes have been attached (and there are no LOCKed TABLES)
would let the load balancer move connections more aggressively.

To break up such a "fresh" transaction (or element of a transaction
chain), the load balancer should be supplied with information about any flags
("READ ONLY", "ISOLATION LEVEL", "WITH CONSISTENT SNAPSHOT", ...) used in START
TRANSACTION, so the transaction can be recreated elsewhere with the same
characteristics. (In fact even flags set with SET TRANSACTION before the
transaction starts should be tracked correctly.)

FUNCTIONAL REQUIREMENTS

Func-Req 1  @@session_track_transaction_state

A new system variable @@session_track_transaction_state shall be introduced
and govern the operation of the functionality described in this WL item.
This variable can be set per SESSION and inherits its initial value from
the GLOBAL setting. The global default is 0 - OFF.

Func-Req 1.1  "off"

A value of "OFF" disables the transaction tracker.

Func-Req 1.2  "track transaction state"

A value of "STATE" enables the transaction state tracker; this will let a
client know whether a transaction is under way, and whether it could be rolled
back (see below). To actually be able to safely re-locate the session, a client
must also subscribe to the transaction characteristics tracker:

Func-Req 1.3  "track transaction state and transaction characteristics"

A value of "CHARACTERISTICS" enables the transaction state tracker as well as
the transaction characteristics tracker. The characteristics tracker
will let a client know how to restart a transaction in another session so it
will have the same characteristics (isolation level etc.) as in the original
session (see Func-Req 4).

As characteristics may be set using SET TRANSACTION before a transaction is
started, it is not safe for the client or proxy to assume that would never be
transaction characteristics if no transaction is active. It would therefore be
unsafe not to track transaction characteristics and "just switch the connection
when no transaction is active" (whether this is detected by the transaction
state tracker or the traditional SERVER_STATUS_IN_TRANS flag). A client wishing
to re-locate a session that might use transactions MUST subscribe to transaction
characteristics!

Func-Req 2  Reporting mechanism

Updates shall be reported through the tracking system specified in
WL#4797 et al.  As specified there, changes in the status of tracked
items will be reported in response to a statement.

Func-Req 3  What is reported

The new functionality must when enabled provide the client with
information whether a transaction is under way, and whether it can safely
be cancelled, for instance to restart it elsewhere.

Func-Req 3.1  Reported transaction states

If transaction state is tracked (see Func-Req 1.2), the server shall provide the
subscriber with the full range of currently active states.

To be able to restart certain transactions in a new session, the client or proxy
must also subscribe to the transaction characteristics tracker (Func-Req 1.3,
Func-Req 4).

3.1.1  "transaction in progress"

3.1.1.1  explicit and implicit transactions

Transactions will be flagged in a way that lets the client distinguish between
implicit and explicit transactions.

3.1.1.1.1  "explicit transaction active"

If @autocommit is 1, a transaction may be started explicitly (with BEGIN/START
TRANSACTION, or COMMIT|ROLLBACK AND CHAIN). It may be ended explicitly
(COMMIT/ROLLBACK) or implicitly (by using a statement that forces an implicit
commit, such as DROP TABLE), changing the state to "no transaction ongoing".

3.1.1.1.2  "implicit transaction active"

If @autocommit is 0, a transaction may be started implicitly or explicitly.

- If no transaction is in progress, an explicit transaction may be started as
per 3.1.1.1.1.
- If no transaction is in progress, an implicit transaction may be started by
using statements that access tables, but do not force an implicit commit.
- If no transaction is in progress, and only commands are used that do not start
a transaction implicitly or explicitly as described above, the state will remain
"no transaction ongoing."

- If a (implicit or explicit) transaction is in progress, BEGIN/START
TRANSACTION will complete the current transaction, and start a new explicit one.
(i.e. the resulting state is always "in explicit trx" even if the previous
transaction was implicit)

- A (implicit or explicit) transaction will also end when a statement forces an
implicit or explicit commit or rollback (explicitly via COMMIT/ROLLBACK without
chaining; implicitly through DROP TABLE etc.). In this case, the state will
revert to "no transaction ongoing".

Corollary:

"no trx" can change to "explicit trx" through BEGIN.
"no trx" can change to "implicit trx" through a statement that uses tables, but
does not force an implicit commit.

"implicit trx" can change to "explicit trx" through BEGIN/START TRANSACTION.

"implicit trx" AND "explicit trx" can return to "no trx" through an explicit
COMMIT/ROLLBACK, or an implicit one (DROP TABLE etc.)


3.1.1.2  "no work attached"

If a transaction is ongoing, but no tables were written to or read, a load
balancer may relocate the transaction to another session with comparatively
little effort. This state is entered after a segment of a commit chain is
explicitly started, e.g. with BEGIN or START TRANSACTION. (Note that WITH
CONSISTENT SNAPSHOT may start off with "transactional read attached" instead, 
see next entry.)

The state of "no work attached" is not explicitly flagged as such; it is implied
by the absence of all of "transactional read", "transactional write",
"non-transactional read", "non-transactional write."

3.1.1.3  "work attached"

3.1.1.3.1  "transactional writes"

Transactional writes may be rolled back. A proxy could in theory abort a
transaction, restart it elsewhere, and replay the writes in the new session;
such a proxy may read "transactional writes pending" as "may relocate."

Proxies that do not implement this advanced functionality may signal a
reconnection/loss of the session state to the application (through
CR_SERVER_LOST and friends) and relocate the connection in the understanding
that the server will rollback the writes, and the client will restart the
transaction or signal a failure to the user (as in a fail-over situation).
It is however strongly encouraged that proxies and libraries that cannot replay
writes themselves interpret "transactional writes pending" as "do not relocate
this session."

3.1.1.3.2  "non-transactional writes"

As a rollback is impossible for non-transactional writes, a proxy or library
must interpret the presence of such work on the transaction as "do not relocate
this session."

3.1.1.3.3  "reads"

The server shall separately flag transactional and non-transactional reads.

Reads tend to occur in one of three contexts:

- In statements such as CREATE...SELECT or INSERT...SELECT, data is read, then
written. In such cases, the proxy or library will want to include the flags for
writes into its decision-making (see 3.1.1.3.1 vs 3.1.1.3.2).

- In statements such as SELECT ... FROM ... INTO @variable, the statement can be
replayed (and is no hindrance to the relocation of the session) if @variable is
tracked.

- Statements that read tables and produce a result set that is send to the
client. While a proxy could re-send those queries, it couldn't easily re-send
the results to the client.

To identify this last case,

3.1.1.3.4  "result sets"

The implementation shall flag the presence of a result set.

As we've seen above, there can be reads that do not result in a result set being
sent to the client. Conversely, some statements (SHOW WARNINGS, certain SELECTs,
etc.) may produce a result set without accessing tables.

a) For purely deterministic queries (algebra etc.), this is safe.

b) For queries involving variables, this is safe if the variables are tracked.

c) For queries using non-deterministic functions (such as UUID()), the
implementation shall flag "unsafe statement" if no tables are involved.

d) If a transaction is active, reads from tables will be reflected in the 
status as "transactional read pending" or "non-transaction read pending",
respectively, see 3.1.1.3.3.

By implication, "result_set | unsafe_stmt" implies "do not relocate."
"result_set" and either of "transactional_reads" or "non_transactional_reads"
likewise suggest "do not relocate."

3.1.2  "no transaction started"

Neither an explicitly started transaction nor an implicitly
started one is currently in progress.

If @autocommit is 1, the session is in this state before
a transaction is explicitly started (with BEGIN or START
TRANSACTION). After a transaction ends (with COMMIT or
ROLLBACK), the server once again enters this state until
the next transaction is started.

If @autocommit is 0, the server may be in an explicitly
started transaction (BEGIN/START TRANSACTION) as per above,
or it may be in an implicit transaction that will end when
it is explicitly committed or rolled back, when a new
transaction is explicitly started, or when a statement is
issued that forces an implicit commit (e.g. DROP TABLE).

The state of "no transaction started" is not explicitly flagged as such; it is
implied by the absence of all of "implicit transaction active", and "explicit
transaction active."

3.1.3  "tables locked"

An active LOCK TABLES shall be reflected in the flags, to
discourage load balancers from relocating the session while
in that state; a characteristics item is not generated for
this case.

Func-Req 3.2  Enabling/disabling transaction state tracking

The server shall provide such information as specified in 3.1 if the
new system variable @@session_track_transaction_state has a value other
than "off", and tracking (see WL#4797 et al.) is enabled.

Func-Req 3.3  Scope/Lifetime

The scope of the tracker item is the transaction: all state-indicating
flags persist until the transaction is committed or rolled back.
As statements are added to the transaction, additional flags may be
set in the tracker item. However, no flags will be cleared until the
transaction ends.

Example:

After a sequence of BEGIN; INSERT ...; SELECT ...; the flags should
indicate an active transaction, a write, and a read ("everything that
happened so far in this transaction"), rather than just the state
resulting from the latest statement (the "read" state resulting from
the SELECT).


Func-Req 4  Tracking characteristics

The new functionality must when enabled provide the client with
information about how to restart a connection in a different session.

When using the characteristics tracker to migrate a connection, it is
the responsibility of the client to also track the system-variables
tx_read_only and tx_isolation_level to correctly replicate session default.

Nomenclature:

A "one-shot characteristic" in the context of this specification shall
be understood to be a transaction characteristic that satisfies all of
the following:

- it is in scope below GLOBAL and SESSION

- it is set with SET TRANSACTION while no transaction is ongoing, or
  with START TRANSACTION

- it pertains only to the next transaction (snapshot) or chain of
  transactions (read only/read write, isolation level)

Until an explicit transaction is started, one-shots are *potential*
characteristics of the next transaction.

This nomenclature is unrelated to the deprecated ONE_SHOT particle 
(and its thd->one_shot_set).


Func-Req 4.0  Format

The information on how to restart a transaction shall be reported
as a list of MySQL statements separated by semicoli. This list may
contain none, one, or several statements. Executing these statements
in a different session is intended to create a transaction with the 
same characteristics as that in the original session (ISOLATION LEVEL,
READ WRITE|ONLY, WITH CONSISTENT SNAPSHOT, ...) if an explicit
transaction is action in the original session. Otherwise, the
statements are intended to set up any one-shots active in the
original session.

Func-Req 4.1  START TRANSACTION READ ONLY|WRITE

READ ONLY/READ WRITE characteristics explicitly set in START TRANSACTION
or SET TRANSACTION shall be reported: If the client gave it explicitly,
we report it explicitly (even if it does not differ from the session
default). Conversely, if the client resets it (by setting the session
value), we'll stop reporting it explicitly:

Func-Req 4.2  Interaction of session variables and one-shots for READ ONLY|WRITE

Setting a session value for READ ONLY/READ WRITE characteristics (using SET
SESSION TRANSACTION [ READ ONLY | READ WRITE ]; or SET
@@SESSION.TX_READ_ONLY=...;) will clear any existing READ characteristics
one-shot (while not affecting an ISOLATION LEVEL one-shot).
It shall therefore be reported through the session
tracking specified in WL#4797. It is the client's responsibility to
track that session variable. Additionally, an updated set of one-shot
characteristics shall be transmitted, removing the READ characteristics one-shot
if one was formerly set.

Func-Req 4.3  Restarting transactions that were part of a CHAIN (READ
characteristics)

When an explicit transaction is active, the tracked item shall contain valid
(My-) SQL statement(s) suitable for restarting the transaction.

The implementation is at liberty to rewrite

  SET TRANSACTION ;
  START TRANSACTION;

into

  START TRANSACTION ;

and vice versa.

Func-Req 4.4  SET TRANSACTION ISOLATION LEVEL

ISOLATION LEVEL characteristics explicitly set in SET TRANSACTION shall
be reported: If the client gave it explicitly, we report it explicitly
(even if it does not differ from the session default). Conversely, if
the client resets it (by setting the session value), we'll stop reporting
it explicitly:

Func-Req 4.5  Interaction of session variables and one-shots for ISOLATION LEVEL

Setting a session value for ISOLATION LEVEL characteristics (using SET SESSION
TRANSACTION ISOLATION LEVEL ...; or SET @@SESSION.TX_ISOLATION=...;) will clear
any existing ISOLATION LEVEL characteristics one-shot (while not affecting a
READ ONLY|READ WRITE one-shot). It shall therefore be reported through the
session tracking specified in WL#4797. It is the client's responsibility to
track that session variable. Additionally, an updated set of one-shot
characteristics shall be transmitted, removing the ISOLATION LEVEL
characteristics one-shot if one was formerly set.

Func-Req 4.6  CONSISTENT SNAPSHOT

WITH CONSISTENT SNAPSHOT characteristics explicitly set in
START TRANSACTION shall be reported.

Func-Req 4.7  CONSISTENT SNAPSHOT in CHAINS

WITH CONSISTENT SNAPSHOT will be reset to "false" when the first
transaction of a chain ends (e.g. with COMMIT [AND CHAIN]), but
as MySQL allows the use of START TRANSACTION [WITH CONSISTENT SNAPSHOT]
within an ongoing transaction, effectively in lieu of COMMIT AND CHAIN,
this characteristic must be correctly reported in such cases.

Func-Req 4.8  Scope

If an explicit transaction is active, the characteristics item shall
contain the required statements to restart this transaction (or this
link of the transaction CHAIN).  If no explicit transaction is active,
the item shall contain the required statements to replicate the current
one-shots (if any).  The implementation should update this tracker item
as new information becomes available.

Func-Req 5 / Dependencies

This WL utilizes functionality added in WL#4797 and WL#6885 to
report changes in its tracked items, and code added in WL#7766
that provides the same information for server replies that
contain result sets.

WL#7766: Deprecate the EOF packet

TRANSACTION STATUS

Transaction state shall be reported as a sequence of ASCII encoded
characters. Each active state shall have a unique character assigned
to it as well as a fixed position in such a string.

The following states may be signaled:

Place 1: Transaction.
  T  explicitly started transaction ongoing
  I  implicitly started transaction (@autocommit=0) ongoing
  _  no active transaction

Place 2: unsafe read
  r  one/several non-transactional tables were read
     in the context of the current transaction
  _  no non-transactional tables were read within
     the current transaction so far

Place 3: transactional read
  R  one/several transactional tables were read
  _  no transactional tables were read yet

Place 4: unsafe write
  w  one/several non-transactional tables were written
  _  no non-transactional tables were written yet

Place 5: transactional write
  W  one/several transactional tables were written to
  _  no transactional tables were written to yet

Place 6: unsafe statements
  s  one/several unsafe statements (such as UUID())
     were used.
  _  no such statements were used yet.

Place 7: result-set
  S  a result set was sent to the client
  _  statement had no result-set

Place 8: LOCKed TABLES
  L  tables were explicitly locked using LOCK TABLES
  _  LOCK TABLES is not active in this session



INTERPRETING TRANSACTION STATUS

The server provides a mechanism to supply certain information.
The implementation of policy is ultimately up to the consumers of transaction
status (libraries, proxies, etc.). That said, consumers deviate from the
following suggestions / default policies at THEIR peril.

- CLIENTS

A client SHALL NOT abort a transaction that has unsafe writes on it.
It SHOULD NOT abort transaction that have unsafe reads (depending on business
logic).
It SHALL NOT relocate a session while LOCK TABLES is active.
It MAY abort all other transactions.

- LOAD-BALANCERS

A transparent proxy MUST NOT abort a transaction that has unsafe writes on it.
It MAY roll back transactions that have transactional writes on them (but
this seems to have little real world relevance as it would have to replay all
work on the server the connection is switched to).
Likewise, it SHOULD NOT abort a transaction that has reads to it that sent a
result set to the client, as even if the load-balancer were to replay to queries
on the new server, it would have no way to transparently update the result set
in the client. There are scenarios conceivable (with de facto read-only tables
etc.) were such reads are harmless, but such a decision would necessarily lie in
the purview of client and, as its agent, the load-balancer, not the MySQL
server, and as such lies beyond the scope of this document.
The proxy SHALL NOT relocate a session while LOCK TABLES is active.



TRANSACTION CHARACTERISTICS

At the time of this writing, the following have a bearing on a
transaction:

- ISOLATION LEVEL
- READ ONLY / READ WRITE
- WITH CONSISTENT SNAPSHOT



SET TRANSACTION [READ ONLY | READ WRITE] [ISOLATION LEVEL ]

MySQL offers two system variables, tx_isolation and tx_read_only.
These are distinct from most system variables offered in that they
have three scopes:

- GLOBAL
- SESSION
- "oneshot" (default scope, for the next transaction only)

The SESSION value is copied from the GLOBAL value when the session
is created (as usual), and the the oneshot value is similarly set to
the SESSION default at session creation, and then again every time
a transaction completes. The oneshot value can be overwritten for the
next transaction, provided no transaction is in progress at the time (i.e.
it can be set before a transaction begun with START/BEGIN, *not* one started
using CHAIN.  To do this, scope is simply omitted from SET TRANSACTION.

For the GLOBAL and SESSION values, SET  TRANSACTION READ ... is
equivalent to SET  tx_read_only=..., and SET  TRANSACTION
ISOLATION LEVEL  is equivalent to SET 
tx_isolation_level=.

Importantly, changing the SESSION value for tx_isolation or tx_read_only
with either method will *also* set the oneshot value (i.e. reset it to
the (new) session default) if no transaction is currently in progress.

Finally, on implicit commit or single-statement commit/rollback,
one-shot values of tx_read_only and tx_isolation are also reset to
session defaults.

http://dev.mysql.com/doc/refman/5.7/en/set-transaction.html



START TRANSACTION [WITH CONSISTENT SNAPSHOT] [READ ONLY | READ WRITE]

READ ONLY or READ WRITE, of which understandably only one may be given,
will set the corresponding characteristics for the current transaction
(where the READ characteristic will overwrite any one-shot setting previously
given with SET TRANSACTION ...).



XA transactions

XA START|BEGIN, XA COMMIT, and XA ROLLBACK shall set and clear
"explicit transaction ongoing" in a manner analogous to that
specified for "regular" transactions. The implementation shall
report XA BEGIN (and attempt to report the xid) when CHARACTERISTICS
are tracked.



TRANSACTION CHARACTERISTICS TRACKER

A client should subscribe to the transaction characteristics tracker if it may
wish to switch its session to another connection at some point and transactions
may be used.

The tracker item may be empty, or it may contain one of several SQL statements:

- if a transaction was explicitly started, the item contains the statement(s)
required to restart the transaction with the same characteristics. As a general
rule, this will be a "START TRANSACTION" statement (possibly with READ
ONLY|WRITE and/or WITH CONSISTENT SNAPSHOT). If any characteristics apply that
cannot be passed to START TRANSACTION, such as ISOLATION LEVEL, a suitable SET
TRANSACTION statement will be prepended (e.g. "SET TRANSACTION ISOLATION LEVEL
SERIALIZABLE; START TRANSACTION READ WRITE;").

- if a transaction was not explicitly started, but characteristics one-shots
were set up, a SET TRANSACTION statement suitable for replicating that set up
will be generated (e.g. "SET TRANSACTION READ ONLY").

- if no characteristics apply, the tracker item will be empty.



TRACKING FOR LOAD BALANCERS

A proxy that wishes to switch a transaction that does not have DML yet
should therefore track a minimum of transaction state, transaction
characteristics, and the session variables tx_read_only and tx_isolation_level.