WL#9017: Group Replication: Secure distributed recovery credentials

Affects: Server-5.7   —   Status: Complete

Group Replication provides a distributed recovery process to
synchronize new member with the group, WL#6837.
This process is triggered on the new member, joiner, by establishing
a asynchronous replication connection to one online member of the
group and fetching all missing data until the point that the joiner
joined the group.

This asynchronous replication connection needs to be allowed to be
established using credentials, more precisely a username - password
tuple.
Currently these credentials are stored on MySQL server configuration
file and/or set by SET GLOBAL VARIABLE command, which causes a
security issue:
  persistence: the password is stored in plain text on a
               configuration file.

To fix these security persistence issue, this worklog will implement
the following changes:

  1. The user will always need to create the recovery channel using the CHANGE
     MASTER command. The plugin will not set any default value for the user and
     password field in CHANGE MASTER.

  2. Restrict the addition of only MASTER_USER and MASTER_PASSWORD for
     the recovery channel on the server side. All other settings for the
     change master are added on the plugin side, as it is done today.

  3. The password value used on CHANGE MASTER commands for the recovery
     channel must not be logged as plain text in any log files.
FR1: Username/password must not be stored on configuration file

FR2: The following plugin options must be eliminated:
     group_replication_recovery_user
     group_replication_recovery_password

FR3: Group Replication must not have regressions.

FR4: If values passed for (user, password) is not specified for the recovery 
channel
     created, then the for the current group, the default credentials (i.e. "","")
     will be used.

FR5: The CHANGE MASTER for the recovery channel should only be able to set the 
     MASTER_USER ans MASTER_PASSWORD. No other parameter should be settable.

FR6: password value of change master command must not be logged to text
     logs: error log, query log, slow log.

FR7: No CHANGE MASTER command will be logged to binary log with and without
     Group Replication running.

FR8: Password value should not be visible in the performance schema tables and in    
     the .mysql_history file. Also "SHOW PROCESSLIST" should not display the plain 
     text password value.
We need to use the CHANGE MASTER syntax for creating the recovery channel. Also 
if the user does not execute a change master to create the channel, the recovery 
channel is to be created on the plugin side (something that happens already).

Also the CHANGE MASTER for the recovery channel executed on the server, should 
only be allowed to change the MASTER_USER and MASTER_PASSWORD values. Rest all 
the other parameters of the recovery channel will be set by the plugin itself.


Server Side :
**************

1. The DBA will always to set username and password using CHANGE MASTER
   command, more precisely through MASTER_USER and MASTER_PASSWORD arguments.
   In case the DBA does not execute CHANGE MASTER command, the default values
   will be "".

2. Allow only the addition of the MASTER_USER and MASTER_PASSWORD values for the
   recovery channel. This should throw error message in case any other parameter 
   is being set by the CHANGE MASTER syntax.

   Error message to be used is :

   ER_SLAVE_CHANNEL_OPERATION_NOT_ALLOWED
   "%-.192s cannot be performed on channel '%-.192s'."

3. The queries for change master should be verified as not getting logged in any 
   of the log files, i.e. binary log, query_log, error log, slow log.

4. The reset slave command for the recovery channel will reset the information
   of the recovery channel and also remove the logs of this channel.

5. Currently at the end of the recovery, the plugin calls reset slave from the
   plugin, which resets the recovery channel i.e. removes the logs and resets 
   the info objects. However, going forward we need to prevent the resetting of 
   the info objects and only purge logs. We need to add support to achieve this 
   on the server so that only the logs are purged.
 
Plugin Side :
**************

1. Also since the channel is already created by the server, we should not go  
   ahead with the channel creation triggered by the plugin (START GR). However, 
   the options such as SSL should be updated for the recovery channel.

2. The default values for the channel use and password (i.e. "", "") will
   not be passed by the plugin.

3. Remove the recovery_user and recovery_password server variable from the GR
   code.

4. Plugin should only purge the logs at the end of the recovery process.
The low level design of the WL -

Server side : 
==============

File : rpl_channel_service_interface.h


  Add function : int channel_only_purge_logs(const char* channel)

  The recovery code calls purge_logs at the end of the recovery process and
  purge_logs internally calls reset_slave on the server.  
  
  The reset_slave can also be initiated by the DBA. So there needs to be a 
  method to actually just call purge_realy_logs() on the server and not reset 
  slave.

  /**
    Purges only the channel logs. Does not reset the channel information.

    @param channel  the channel whose logs needs to be purged.

    @return the purge operation status
      @retval 0     OK
      @retval !=0   Error
  */
  int channel_only_purge_logs(const char* channel);


File : sql/rpl_slave.cc

    Add function : 
 
    // This method is used to check if the user is trying to udpate any other
    // option for the change master apart from the MASTER_USER and
    // MASTER_PASSWORD. In case user tries to update any other parameter apart
    // from these two, this method will return error.

    // Return value 
    // true - the CHANGE MASTER is updating a unsupported parameter for the
    //        recovery channel.
    //
    // false - Everything is fine. The CHANGE MASTER can execute with the
    //         given option(s) for the recovery channel.

    static bool is_invalid_change_master_for_group_replication_recovery(const
                                                                    
                                                           LEX_MASTER_INFO*
                                                           lex_mi)

    {
      ....
      ....
    }


    Modify function : 

    // Add changes in the change_master method to allow executing the change
    // master command only for the recovery channel if only user and password
    // value are set.
    change_master_cmd(THD* thd)
    {  
      ...

      if (channel_map.is_group_replication_channel_name(lex->mi.channel) &&
          !channel_map.is_group_replication_channel_name(lex->mi.channel, true))
      {
        // Add check if any other option for the change master is being changed
        //  apart from MASTER_USER and MASTER_PASSWORD
        if (is_change_master_option_for_group_replication_recovery(lex_mi))
        {
          my_error(ER_SLAVE_CHANNEL_OPERATION_NOT_ALLOWED, MYF(0),
                   "CHANGE MASTER", mi->get_channel());
          res= true;
          goto err;
        }
      }
      ....
    }


Plugin side : 
==============

File - src/plugin/group_replication/src/plugin.cc

1. Remove these two plugin options and replace the use of these two variables
   with the values read from the system tables.

    Parameters to be removed :

    1. group_replication_recovery_user
    2. group_replication_recovery_password


  //Recovery module variables

  static MYSQL_SYSVAR_STR(
    recovery_user,                              /* name */
    recovery_user_pointer,                      /* var */
    PLUGIN_VAR_OPCMDARG,                        /* optional var */
    "The user name of the account that recovery uses for the donor connection",
    check_recovery_con_user,                    /* check func*/
    update_recovery_con_user,                   /* update func*/
    "root");                                    /* default*/

  static MYSQL_SYSVAR_STR(
    recovery_password,                          /* name */
    dummy_recovery_password,                    /* var */
    PLUGIN_VAR_OPCMDARG,                        /* optional var */
    "The password of the account that recovery uses for the donor connection",
    check_recovery_con_password,                /* check func*/
    update_recovery_con_password,               /* update func*/
    "");

  Also remove the associated functions for update and check for these two 
  variables. 

2. Changes plugin to execute PURGE LOGS after distributed recovery it is done
   instead of RESET SLAVE.

File : src/plugin/group_replication/src/replication_threads_api.cc

Modify function : Replication_thread_api::purge_logs(bool only_purge_logs)
                  {
                    .... 
                    int error= 0;
                    if (only_purge_logs)
                      error= channel_only_purge_logs(interface_channel);
                    else
                      error= channel_purge_queue(interface_channel, true);
                    ....
                  } 

3. The plugin will not pass any value for the user and password for the
   recovery channel. So that needs to be removed.

File : replication_threads_api.cc

Modify function : initialize_channel

  change the the username and password value to NULL for the recovery channel
  configuration.


Global Changes for the test :
==============================

      
src/plugin/group_replication/tests/mtr/inc/group_replication_configuration.inc
    
  # Set distributed recovery user.
  CHANGE MASTER TO MASTER_USER='root' FOR CHANNEL "group_replication_recovery";

    
src/plugin/group_replication/tests/mtr/inc/group_replication_clear_configuration
.inc
    
    # Clean distributed recovery user.
    RESET SLAVE ALL FOR CHANNEL "group_replication_recovery";