WL#4677: Unique Server Ids for Replication Topology (UUIDs)

Affects: Server-5.6   —   Status: Complete

RATIONALE
=========
* One step less to setup replication
* Make outside discovery of replication topology less error-prone,
  since servers will "always" have different ids

SUMMARY
=======
1. Generate and store a real UUID in every server instead of 
   only using the user-provided "server-id".
2. Ship the server-uuid from the master to the slave as 
   part of the slave-registration.
3. Make the "current master of the slave"'s UUID visible on the 
   slave with a SELECT or SHOW command.

DEFINITION / CLARIFICATION
==========================
- The "current master of a slave" is the MySQL server that the slave 
  will start to replicate from when the START SLAVE command is 
  executed.  The current master of a slave can be changed at any
  moment by using the CHANGE MASTER command.

LIMITATIONS
===========
- The slave will not know the UUID of its current master until
  the START SLAVE command has been executed.

BACKGROUND
==========
Jan report that in Enterprise Mananger (MEM) it would make 
their lives a lot easier to discover the topology of the 
replication as the ID becomes globally unique instead of 
be unique per replication setup (or even just replication path). 

MEM have to answer questions "who are the current parents of 
this slave" and this is currently very fragile.

We can only try to match the output of SHOW PROCESSLIST 
with SHOW SLAVE STATUS which may not match if 
NAT/Firewalls/VRRP are in use which change the IP address. 
This is very common.

Right now, MEM try to solve it by storing a UUID in a mysql.inventory
table on each mysql-server and if we ask a slave for its parent, we
read the master.info, take the slave's password and log in into the
master to read its server-uuid.

OPEN QUESTIONS
==============
- It is unclear if MEM will accept the limitation above.
  -- Lars Thalmann, 2008-12-12

COMMENTS
========
On Fri, Dec 12, 2008 at 07:01:00PM +0000, Mark Leith wrote:
> Will it store it somewhere after it has first connected to it's
> master (i.e, store it in the master.info until the next CHANGE
> MASTER command)?  Or will it be cached until the slave disconnects,
> and then have to re-determine it with another START SLAVE?

On Fri, Dec 12, 2008 at 12:42:56PM -0800, Kay Röpke wrote:
> I agree, if stopping the slave leads to the master's UUID being
> inaccessible this would make repl topology discovery awkward.
>
> However, I'm fine that you need to start the slave at least once for the
> uuid to show up in the slave status, if it hasn't been started we could
> use the same mechanism to discover the master uuid as we do now: reading
> the master.info and logging in with those credentials, provided the uuid
> is accessible via SQL, of course.

REFERENCES
==========
BUG#33815 As part of this WL, this bug should be possible to close as
well.  Just adding some better error/warning messages.

See also BUG#16927.
CAUTIONS
========
1. Backward compatible
2. UUID probably be used without replication.

OPEN QUESTIONS
==============
1. When is server's UUID generated?
   [ ] When a server is being installed. 
   [ ] When a server runs for the first time.
   [ ] UUID can be set by administrators at any time.
   [X] When server is starting and initializing, UUID is generated 
       automatically if it can not read a UUID correctly from the UUID file.

2. Where is server's UUID stored? 
   [X] Stored in a file which is in data directory.
   [ ] Stored in a table.

3. Can replications run without server_id option, if UUID is set ?
   [ ] Yes, server_id is replaced by UUID completely when both of master and 
       slave supports UUID. server_id is till needed if master or slave does 
       not support UUID.
   [X] NO, Replication can not run without server_id. server_id is used in
       log event. If we allow thus, it is more risk and has more work. 

4. How to show server's UUID
   [X] A global variable named as 'server_uuid',
       and SHOW SLAVE STATUS will show master's UUID if it exists.

5. How to handle master's UUID when doing 'CHANGE MASTER'.
   [X] UUID can NOT be set as a option of 'CHANGE MASTER' and It will be 
       cleared after this operation. and Master_Server_Id should also be  
       cleared.
   [ ] However, Master_Server_Id is the old one and is not cleared. 
       So Master_UUID should keep the old value.

6. How to handle master's UUID when doing 'STOP SLAVE [IO_THREAD]'
   [X] Keep the value.

7. Who has privilege to read the variable ?
   [X] Anyone.
   [ ] Administrator
   [ ] Other

SPECIFICATION
=============
1. Each server shall have a file which is used to permanent store the server's  
   properties which are generated automatically. The file is named as 
   'auto.cnf' and saved in top directory of data(@@DATADIR). The file
   has the same format with my.cnf
   - Server's UUID save as server_uuid
   - There is a 'property' section, 'server_uuid' is saved in this section.
   - The file like this:
   [property]
   server_uuid='A UUID'
   
2. Each server shall have a global read-only variable which can be
   used when server is running. Users can find out server's uuid by
   using 'SELECT' and 'SHOW VARIABLES'. The variable is named as 'server_uuid'.

3. When starting, mysqld checks whether its UUID exists or not. If not exists,  
   the UUID is created first and store into file and generate a warning 
   that UUID did not exist and created it. Then the UUID is load into the 
   'server_uuid' variable. 
   - The UUID usually exists except when the server is started for the first 
     time.
   - If the old UUID is expected when reinstalling a server (e.g. on a 
     different machine), the old UUID file can be copied into the correct 
     directory.
   - The UUID is generated in the same way of UUID() fucntion.

4. When a slave is connecting to its master, slave sends its UUID to its master
   which stores slave's uuid as a user variable of the dump thread. 
   The variable is named as '@slave_uuid'.

5. Slaves' UUIDs can be showed by using 'SHOW SLAVE HOSTS'.

6. After a slave I/O thread has connected to its master, it acquires its 
   Master's UUID, and then save it into the  
   Master_info object and 'master.info' file. 
   - Slave I/O thread will generate an error and abort if its master's UUID is 
     equal to itself's unless --replicate-same-server-id option is set.
   - Slave I/O thread will generate a warning if its master's UUID does not 
     exist.
   - Slave I/O thread will generate a warning if 'CHANGE MASTER ...' is never 
     called but the new master's UUID is not equal to the old one.

8. Master's UUID can be showed by using 'SHOW SLAVE STATUS'.

9. After executing 'STOP SLAVE [IO_THREAD]' successfully, Master's UUID  
   is still kept into Slave status.

10. After executing 'CHANGE MASTER ...' successfully, Master's uuid and id
    should be cleared if MASTER_HOST or/and MASTER_PORT are different with
    the old one.
    - MASTER_HOST can be set either IP address or Domain name. We don't
      translate a domain name to a IP address when we decide if they are
      equal to each other. For example: we think 'localhost' is not equal to
      '127.0.0.1'.
11. After executing 'RESET SLAVE' successfully, 'master.info' file will be 
    deleted as usual. But just like MASTER_HOST and MASTER_PORT, Master's 
    UUID is still kept in memory and the result of 'SHOW SLAVE STATUS'.
    Keeping Master_UUID in memory let us know that the slave does not change
    its master. if Master_UUID in memory is NULL, it means the slave has changed
    its master, or it is the firs time connecting to a master.

DESIGN
======
1. Initialize UUID
   int init_server_auto_options(); 
   RETURN: 
     Return 0 or 1 if an error occurred.
   DESCRIPTION:
     Load all server's auto-generated options from 'auto.cnf' file.    
     Only one auto-generated option exists now, it is server_uuid.
     The server_uuid shall be loaded into global variable server_uuid.   
     It will be generated first if it did not exists in the 'auto.cnf',
     and then be stored into the file.
     It always reports a warning when generating a UUID.
   CALLER: 
     init_server_component().

2. int generate_server_uuid()
   RETURN: 
     Return 0 or 1 if an error occurred.
   DESCRIPTION:
     Generate a UUID and save it into global variable server_uuid variable.
   CALLER: init_server_auto_options;
     
3. Sync master's UUID to slave.
   int get_master_uuid(MYSQL* mysql, Master_info* mi);
   RETURN: 
     0: Success
     1: Fatal error
     2: Network error
   DESCRIPTION:
     Get master's UUID and set it into mi when start slave IO thread.
     If master's UUID is same as UUID of slave's, it generates an error that
     master's UUID is same with slave and then stop slave IO thread. 
     If master's UUID does not exist, it generates a warning that master
     does not support UUID. 
     if slave did not change its master, but the master's uuid was changed, it 
     generates a warning that master is not change but Master's UUID has  
     changed.
   CALLER: 
     handle_slave_io().
   SEE ALSO: get_master_version_and_clock().

4. Show and Save master's UUID.
   The following CLASS and FUNCTIONs should be changed.
   Class Master_info;
     Variable master_uuid is added into Master_info, it stores master's UUID.
   bool show_master_info(THD* thd, Master_info* mi);
     Master_info's master_UUID should be showed as 'Master_UUID'.
   int Master_info_file::do_flush_info();
     Master's UUID is saved into the 'Master.info' file.
   int init_master_info();
     Load master's UUID info mi from the 'Master.info' file.

5. Sync slave's UUID to master.
   The following FUNCTION should be changed.
   int connect_to_master().
   Use 'SET @SLAVE_UUID' to set the slave_uuid when connecting to a master.

6. Show slave
   The following FUNCTION should be changed.
   bool show_slave_host();
     Add slave's UUID which in SLAVE_INFO into the result of 'SHOW SLAVE HOSTS' 
Add global variable
---------------
extern char server_uuid[UUID_LENGTH+1];

Add class variable
------------------
- Class Master_Info
  char master_uuid[UUID_LENGTH+1];

1. int init_server_auto_options();
{                                                                              
   Call 'load_defaults' to read all options from 'auto.cnf'.
   Call 'handle_options' to save all options into variables.

   if (option_uuid)  exists
   {
     copy it into server_uuid
   }
   else
   {
      Call 'generate_server_uuid' to generate the server's uuid. 
      Call 'flush_auto_options' to save it.
   }

   return 0; 
}

2. int generate_server_uuid();
{
  /* Fake a thread */
  THD thd= new THD;
  
  initialize thd;
  Item_func_uuid func_uuid= new (thd->mem_root) Item_func_uuid();
  initialize func_uuid;
  call func_uuid->val_str(String) to generate a uuid;
  delete thd;
  /* Remember that we don't have a THD */
  my_pthread_setspecific_ptr(THR_THD, 0);
  copy uuid into server_uuid;
  return 0;
}

3. int get_master_uuid(MYSQL* mysql, Master_info* mi);
{
  mysql_real_query(mysql,
                   STRING_WITH_LEN("SHOW VARIABLES LIKE 'SERVER_UUID'");
  if (master's uuid) exists
  { 
    if (master's uuid) == (::server_uuid)
    {
      mi->report(ERROR_LEVEL, ER_SLAVE_FATAL_ERROR, ER(ER_SLAVE_FATAL_ERROR), 
                 error message);
      // Fatal error
      ret= 1;
    }
    else 
    {
      if (mi->master_uuid[0] != 0 && mi->master_uuid != (master's uuid))
        sql_print_warning(warning message);
      copy master_uuid into mi->master_uuid;
    }
  }
  else if (mysql_errno(mysql))
  {
    if (is_network_error(mysql_errno(mysql)))
    {
      mi->report(WARNING_LEVEL, mysql_errno(mysql),
                 warning message, mysql_error(mysql));
      ret= 2;
    }
    /* Fatal error */
    mi->report(ERROR_LEVEL, ER_SLAVE_FATAL_ERROR, ER(ER_SLAVE_FATAL_ERROR), 
               error message);
    ret= 1;
  } 
  else if (master's uuid) not exists
  {
    mi->report(WARNING_LEVEL, ER_UNKNOWN_SYSTEM_VARIABLE,
               "Unknown system variable 'SERVER_UUID' on master, "
               "maybe it is a *VERY OLD MASTER*.");
  }
  return ret;
}