WL#7293: Make sure that when 32-bit connection-IDs roll over, still in-use IDs are not re-issued

Status: Complete   —   Priority: Medium

PROBLEM

Even when the server is compiled for 64-bit (and connection-IDs
consistently had that type internally, which currently they don't),
the wire-protocol still only specifies 32 bits for ID. Short of
requiring all apps (including those our customers may not have
the source for) being switched to a 64 bit protocol all at once,
support for the 32 bit IDs must remain for the foreseeable future.

The problem with this is that some customers have a sufficient
number of connections for this 32-bit ID to roll over (exceed
the data-type's number space, and start at 1 again, much like
in the Y2K bug).

In most cases, the "low numbers" were used by connections that
have long disappeared, so the IDs could be "recycled" and be
issued again.  However, there are some outliers, thread_IDs
that may persist for a long time (such as replication threads
or the event scheduler, perhaps monitoring), so we must prevent
thread_IDs that are actually still in use from being re-issued
to new threads (as if two threads have the same ID, a KILL <ID>
would be ambiguous).

Finally, there is a possible-but-unlikely race condition:
thread A is created
thread B saves A's connection-ID
thread A eventually disconnects
thread C is created, and gets the now free ID thread A had
thread B tries to KILL thread A, but instead KILLs C
         as that now has the ID previously associated with A,
         and B is unaware that A went away and was "replaced".



PROPOSAL

REQUIRED:
The server must be prevented from issuing a connection ID that
is currently in use.

* For minimum invasiveness in the code base and maximum
  compatibility with legacy clients, it is proposed that
  the ID space remain at 32-bit for the foreseeable future.

* The current C-API (libmysql) and SQL functions will likewise
  remain available for backward compatibility.

* The server MUST prevent in-use IDs from being re-issued.


DEFERRED:
As the aforementioned race condition is very unlikely to occur
and the customer has confirmed they do not require this functionality,
and even many operating systems do not protect against similar issues,
the handling the race condition shall be considered beyond the
scope of this implementation.
If another case is later opened to add such a safeguard, the following
method is suggested for consideration:

* The server SHOULD send compatible clients a tracker-item
  containing a timestamp (time_t) on connect.

* Clients wishing to avoid the aforementioned race condition
  MAY use new syntax that submits both connection ID and
  connect-timestamp, either as KILL <ID> STARTED AT <timestamp>
  or the equivalent mysql_kill_safe(MYSQL *, id, timestamp)

* This will uniquely identify or current or historic connection
  throughout the lifetime of the MySQL server even if a connection-ID
  has been re-issued.



INTERESTED PARTIES

Runtime, Support, Protocol-interested Group, Replication.



FURTHER READING

BUG#44167

User Documentation
==================

http://dev.mysql.com/doc/relnotes/mysql/5.7/en/news-5-7-5.html
Func-Req 1

In busy installations, more than 2^32 connections may be made, at
which point the connection-ID will roll over and start at 0
once again.
At this point, the implementation must necessarily re-issue
connection-IDs that have been issued before:

Func-Req 1.1

The implementation must avoid re-issuing IDs that are actually
still in use.

Func-Req 1.2

The implementation must address threads which have thread_IDs,
but are not actually connections in the strict sense, such as
the event scheduler.

NFR 1

The update should clean up the code to the effect that the
designated data-type for connection-IDs (my_thread_id) is
used consistently throughout, and that format-strings referring
to such items are adjusted and have their arguments cast accordingly.
Changes to interface specification: n/a

The selected design has as its only observable feature the fact 
that duplicates of connection-IDs that are in use at a given 
time are prevented from being issued. Keeping the protocol 
unchanged was a specific design goal, and there are no configuration
options or monitors:

No changes to options, system variables, SQL, server errors, client 
errors, files, log, monitoring, protocol, plugins, API, or information
schema.
OTHER CONSIDERATIONS / BACKGROUND

This section discusses details of the current implementation
in trunk, alternative solutions, and why they were not chosen.



"32-bit connection-ID externally"

The wire protocol currently reserves 4 bytes / 32 bit for
connection IDs.  Connection-IDs are sent by the server in
the handshake packet.  This specifically means that the
ID is sent exactly once -- when the connection is established.
(If the connection breaks and is re-established, either explicitly
by the clients, or implicitly because auto-reconnect is enabled,
this happens again, and we receive a new connection ID.
Auto-reconnect is a special consideration here, as it transparently
changes the connection-ID, see below.)

Connection IDs are also used for temp-tables; the current
code again works with 32 bit, even on 64-bit platforms.

binlog likewise uses 32 bit.



"64-bit connection-ID internally"

my_thread_id is defined to ulong in my_pthread.h, meaning that
connection-IDs are nominally 32-bit on 32-bit platforms, and
64-bit on 64-bit platforms.  However, the my_thread_id type is
not currently used everywhere it should be.

On SQL level, everything has been updated (SELECT CONNECTION_ID(),
KILL, column type for connection-IDs in log tables) to cope with
64-bit wide IDs.



"64-bit everything" -- far future

64-bit IDs would push back the ID roll-over far enough for it to be
considered "safe for all practical scenarios."  Seeing as this would
require a consistent use of the correct data-type, extensive testing,
update of temp-table handling, update of the protocol, etc., this is
no short-term solution.  (It would also arguably imply redefining the
data type so that connection-IDs are 64-bit even on 32-bit platforms.)
Even so, unless this new 64-bit server would only accept connections
from clients compiled against a matching 64-bit client library, the
server would still have to support 32-bit clients.  This would make
the implementation more complicated and brittle -- would 32-bit clients
draw IDs from 1..2^32-1, either with the same rollover issues as before,
or with one of the rollover protections described here, while 64-bit
clients draw safe IDs from 2^32..2^64-1?  Seeing as 32-bit clients likely
won't disappear soon, due to legacy clients (and new clients if we don't
force 64-bit IDs even on 32-bit clients), "64-bit everything" is non-trivial.



"Limitations of solutions utilizing the current SQL layer"

Since the SQL layer nominally supports 64-bit (but needs more
testing), clients could in theory SELECT CONNECTION_ID(); ... KILL @x;
even for 64-bit IDs.  If however an auto-reconnect happens during
our above "...", we'd once again be using a stale connection-ID.
(Our SELECT returned the ID our connection had before it broke;
after reconnect, a new ID was issued.

This could be alleviated with a 64-bit tracker item sent to 
compatible clients on connect, but this would once again raise 
the issue of different numberspaces and different handling for 
old and new clients.

A workaround for having to handle 32-bit and 64-bit differently would be
updating the server like so:

- once mysqld in 64-bit exceeds 32-bit numberspace, it will always
  send wire protocol ID of 0. It will always send a correct 64-bit ID
  to compatible clients as a tracker item.
- if mysqld receives a mysql_kill(0), or a KILL 0, it refuses the request
  This will prevent old clients from killing the wrong thread.

This combination should have the following effect:
mysqld32: all clientlib and app combinations work as before
mysqld64: clientlib64 will always work.
          legacy clientlib32 will work as before in smaller
          installations; in high-request settings, the API
          function mysql_kill() will be rejected once 32-bit
          numberspace is exceeded.

As a business consideration, this may break legacy apps that a
client may not have the option to easily recompile.

If we allow a mixed mode -- 64 bit for clients that support it,
with a sub-range 1..2^32-1 used by legacy clients that don't,
we're back where we started: we need to address the case of the
32-bit IDs rolling over.  (But now, we've got two problems:
one thread-ID counter for 32-bit IDs, and one for 64-bit IDs.)



"Alternative protection against race-condition"

(A footnote in case addressing the race-condition is reconsidered
at a later date.)

An alternative solution for the (deferred) issue of the race 
condition described implemented a server-side only model that
would not make an ID available to be re-issued immediately
after its connection ends, but would reserve it for a configurable
time (not entirely unlike SO_LINGER/TIME_WAIT and friends).
The advantage of that model is that it has no need for the client
side to use a current/updated library.
The disadvantage of that model is that it's more invasive on the
server side (requiring memory for the block/linger-list, and code
for housekeeping of that list, with the least locking overhead
possible).
In contrast, the overhead of the "shared solution" (updated server 
and updated client-lib co-operating if the client desires protection
against the race condition) shows minimal overhead (memory-wise,
the additional overhead per THD is sizeof(time_t); code-wise the onus
is partially on the client-app (which has to use the new syntax)).
On balance, the split solution would seem preferable because of its
simpler locking, and lower memory and CPU requirements, because of
which it should scale much better. The shared solution furthermore 
has the advantage that the protection does not just cover a linger 
time, but the entire runtime of the server.



"ID space"

The assumption is that there will be hundreds or thousands
of connections at any one time, in a numberspace of 2^32,
i.e. more than 4,000 million.  Even at capacity (we actually
have about max_connections connections), this will be a sparse
use of the ID space.

Further, IDs increment until roll-over (at 2^32-1).

It is assumed that as a general rule, there will be few
extremely long-lived connections (replication threads,
perhaps monitoring, etc.).  For the most part, connections
will be made, used for a while, and then broken.  It is
therefore assumed that at the time of the roll-over / wrap-
around, most connection-IDs will be in the top half of the
numberspace (to say the very least), with only a few long
lived outliers deviating from this (e.g. replication thread
created at server start with e.g. connection-ID 3, never
disconnected, still holds that ID).  The design was made
with this scenario in mind, as well as safety of operation
in other, non-optimum scenarios.



"ID selection"

Several methods to select IDs and guards against duplicates
(i.e. to make sure that the new ID does not coincide with any
of the currently in-use IDs), for instance the ID space could
be sub-partitioned, up to the case where each thread has their
own "sub-space" that rolls over individually and specifically
does not need a lock for protection from other threads.
Unfortunately, this mitigates the situation in that we'd have
to sub-divide equally, leading to a very quick roll over within
each subpartition when max_connections is configured to large
values.

A more realistic approach is to leverage the infrastructure
already in place: in trunk, atomic primitives are used to
create a new connection-ID, which are then inserted into a
std::set (an ADT that enforces uniqueness and is usually
implemented as a tree). We can maintain the speed- and
locking characteristics by simply changing the set so that
it is sorted by ID. This would give us pseudocode along the
lines of,

thd->id= get_next_id(); // atomic

lock();
while (true) {
  if (thd_list->insert(thd) == SUCCESS)
    break;
  thd->id= get_next_id();
}
unlock();

As ID generation would remain unchanged (and atomic), and
we'd still do one insertion into the thread-list which would
remain the same data-type, no grave changes in terms of memory
consumption, behavior, or performance are expected.

* As a theoretical consideration, a pathological case could be
artificially created where the "next ID" is at the beginning
of a large range of in-use IDs, and a lot of connections are
suddenly made at the same time, so all the new connections
would start of with consecutive optimistic/prospective IDs
which all happen to be in the in-use range that lies ahead.
Even then the first connection that manages to get the lock
for the thread_list will then traverse the in-use block, and
take the first free ID behind it. The other new connections
can than benefit from the updated global counter and need to
update their ID only once, rather than having to each scan
the in-use range themselves. Even so, this case is highly
academic in its assumptions; even with a very high value for
max_connections, there is a natural trade-off in that if many
connections are still in use, then not a lot of new connections
could be made that "congest" ID-creation; and conversely, if
many new connections are made, no large clusters or ranges of
in-use IDs should be expected ahead (rather than behind) the
current ID counter in the already sparse ID-space.