WL#2860: INFORMATION_SCHEMA table for SHOW SLAVE HOSTS

Affects: Server-9.x — Status: On-Hold

Description
High Level Architecture

SUMMARY
-------
Master should store information about slaves which have been connected
in a table instead of memory.

REQUIREMENTS
------------
- (Elliot) The merlin folks definitely want to see slaves which are
  not currently connected. Since Merlin monitors health, it is valuable
  for it to connect to a master, and then draw some picture with a big
  alert showing the slaves that were not able to be contacted

DECISIONS REGARDING THIS WL
---------------------------
- Elliot 9 Nov 2005:
  In the September 2005 Chicago dev-mtx planning meeting, during a
  Merlin discussion, it was decided that we should change SHOW SLAVE
  HOSTS to store information about slaves which have been connected in a
  table instead of memory.  This should be done in 5.1, and is needed
  for Merlin advisors.
- Lars 21 Feb 2006:
  According to Scrum meeting this should be in 5.2.

OPEN ISSUES
-----------
- How to purge table?
- What format of table?
- How to update table?

DRAFT (TO BE DISCUSSED)
----------------------

1. What should be done

MySQL is extended with a new INFORMATION_SCHEMA.SLAVEHOSTS table.
The table has the same columns as displayed by SHOW SLAVE HOSTS
statement plus an additional column CONNECTED which defines whether
the connection to the slave is OK.

  Column name        Description
  -----------        -----------
  MASTER_ID          integer
  SERVER_ID          integer
  HOST               varchar
  PORT               integer
  USER               varchar
  PASSWORD           varchar
  RPL_RECOVERY_RANK  integer
  CONNECTED          bool

REMARK.
  SLAVEHOSTS and SHOW SLAVE HOSTS display slaves which are "registered"
  within the master. Currently, registration of a slave occurs each time
  the slave connects to the master. A slave can become unregistered only
  on master restart. Because of that SLAVE HOSTS generally does not display
  the overall "replication topology" (as defined by which master and
  slaves are installed and how they are configured). E.g.,
  - Suppose the following occurs: slave Si shutdowns, master M shutdowns,
    master M restarts. In this case, Si will not be displayed by
    SLAVE HOSTS until Si restart.
  - Suppose Si is uninstalled. In this case Si will stay visible by
    SLAVE HOSTS until master M shutdown.
  Note also, that it is not quite obvious how to add a direct slave
  unregistering (consider the case when slave wants to unregister itself
  but cannot connect to master).

REMARK
  When connections are to be checked?
  One alternative is to check connections during SLAVEHOSTS table
  filling (like other INFORMATION_SCHEMA tables, SLAVEHOSTS is filled
  when a reference to this table is encountered in a query). But this
  can make the table filling too slow (checking is based on waiting
  for response from slave).
  Other alternative is to check periodically the connections to all
  registered slaves in a separate thread and to use the result of the
  last check in the SLAVEHOST.
  Yet one more alternative is to use the result of each attempt of the
  master to connect to a slave (e.g. when sending an event).

2. Merlin's case

The overall "replication topology" is to be described by an ordinary
table, say 'repology', created within master server. It defines which
slaves are configured to this master. Something like this:

  CREATE TABLE repology (name VARCHAR, server_id INT);

The table is managed by an administrator who adds/deletes entries
according to changes in the topology):

  INSERT INTO repology ('slave01', 101);
  INSERT INTO repology ('slave02', 102);
  INSERT INTO repology ('slave03', 103);
  ...
  DELETE FROM repology WHERE name='slave12';
  ...

To get the current state of slaves some kind of outer join can be used,
e.g.

  SELECT * FROM repology AS t1
           LEFT JOIN INFORMATION_SCHEMA.SLAVEHOSTS AS t2
           ON (t1.server_id=t2.server_id);
  +---------+-----------+-----------+-----+-----------+
  | name    | server_id | SERVER_ID | ... | CONNECTED |
  +---------+-----------+-----------+-----+-----------+
  | slave01 |       101 |       101 | ... |         1 | - connected
  | slave02 |       102 |       102 | ... |         0 | - registered, not 
connected
  | slave03 |       103 |      NULL | ... |      NULL | - not registered
  | ...     |       ... |       ... | ... |       ... |
  +---------+-----------+-----------+-----+-----------+


REMARK.
  Using right join instead of the left one above will show the slaves
  which are registered to the master but are not included into the repology:

  SELECT * FROM repology AS t1
           RIGHT JOIN INFORMATION_SCHEMA.SLAVEHOSTS AS t2
           ON (t1.server_id=t2.server_id);
  +---------+-----------+-----------+-----+
  | name    | server_id | SERVER_ID | ... |
  +---------+-----------+-----------+-----+
  | slave01 |       101 |       101 | ... |
  | slave02 |       102 |       102 | ... |
  | ...     |       ... |       ... | ... |
  | NULL    |      NULL |       200 | ... | - not included in repology
  | ...     |       ... |       ... | ... |
  +---------+-----------+-----------+-----+

SOME MORE IDEAS FROM COMMUNITY
-
http://code.openark.org/blog/mysql/impressions-from-mysql-conf-2011-part-iii-bof-replication

SEMANTICS OF SHOW SLAVE HOSTS
=============================

Before indulging into details of implementation one fundamental issue must be
clarified: what "SHOW SLAVE HOSTS" statements is supposed to produce. There are
two general understandings which are not clearly distinguished in the docs.

a) "SHOW SLAVE HOSTS" generates list of slaves connected (potentially or
actually) to the host on which query was issued.

b) "SHOW SLAVE HOSTS" generates global table of all known master->slave
connections. That is, the list produced by the query is *the same* on all hosts
participating in replication.

The current code and some formulations in the reference manual suggest that
currently we (try to) support meaning b) although the support is faulty in many
respects.

One could argue in fawour of a) however:

1. It might be more natural that "SHOW SLAVE HOSTS" issued on master M talks
only about slaves directly connected to M.

2. Implementing meaning b) is much more complex. Basically one needs to maintain
a distributed database of all known master->slave connections. Some kind of
replication would be necessary so that any host in the replication network knows
about a slave connecting/disconnecting to/from master in any other part of that
network. For example, consider replication topology:

   (A <- ) B <- C -> D
              
If new host A connects to B as slave, D would have to know about that, so that
"SHOW SLAVE HOSTS" on D reports that A is connected to B.

3. Even when "SHOW SLAVE HOSTS" has meaning a), it is possible to discover
global replication topology. To do that, it is enough to connect to all involved
masters, execute "SHOW SLAVE HOSTS" on each of them and compile the global list
from the results.

Of course, changing meaning of the query must be done carefully as some clients
might rely on the old meaning. However, this is probably not such big problem
because: 
- meaning b) is not clear from the documentation,
- it is not correctly implemented in the  current code anyway.

ACTION: someone should decide which meaning of "SHOW SLAVE HOSTS" we are going
to implement.