WL#10803: MySQL GCS: Support name resolution in whitelist

Affects: Server-8.0   —   Status: Complete   —   Priority: Medium

EXECUTIVE SUMMARY
======================
This worklog implements support for hostnames in the
GCS/XCom connection whitelisting feature. Therefore,
users will be able to configure whitelisting not only
with IPs but also with hostmames

DESCRIPTION
======================

In Cloud Environments, the following scenarios are quite common:
- Machines popping on and off of a group;
- IP Address are obtained from a pool that is dedicated to some machines;

In this case, it is quite hard to maintain a proper whitelist, since values are in 
constant mutation.

For that, it was requested that, since we already support names in peer addresses 
and in local addresses, one should also support names in the whitelist parameters, 
e.g: www.randomname.com/16, as an example.

User Story:
- As a DBA, i want to be able to configure whitelist parameter using hostnames 
instead of IP Addresses so that I am able to allow connections from outside
hosts that do not have a fixed address

- As a DBA, i want to be able to configure whitelist parameter using a mix of 
hostnames and IP Addresses with netmasks I am able to allow connections from 
outside
hosts that do not have a fixed address and still allow fixed IPs to connect to 
the group

FR-1: It shall be possible to configure the whitelist using names in addition to
physical IP Addresses

FR-2: It shall be possible to configure the whitelist using a mix of names
and IP Addresses

FR-3: Specifying ranges or single addresses using hostnames MUST
observe the same format rules that the configuration with
IP addresses observes.

FR-4: Whitelist name resolution will happen in runtime whenever a connection 
arrives.

FR-5: If a name is not resolvable, it will not be taken into account for
whitelist validation.

FR-6: If a name is not resolvable, a warning must be written to the error log.

FR-7: After successful name resolution, IP whitelist verification will proceed
the same validation path as if a regular IP was configured.
1. Introduction
========================

As of today, one can go to GR/GCS and configure a whitelist of addresses, that 
will determine which IPs and/or range of IPs are authorized to communicate
with each member of the group. 

Over the course of time, GR has been deployed in several costumers and 
environments. Some deployments have specificities that made 
GCS/GR to be configured in a way different than before, 
mainly because of the fact that machines don't have the same physical
address in-between restarts. Adding to that, the IPs are taken from a 
pool of available addresses.

Having that in mind, both local_address and peer_address already accepted names
as parameters, but whitelist did not. This causes issues, since nodes that were
previously configured with a certain IP, could not reconnect to the group after
a restart with an address change.

2. Support for hostnames in the whitelist
==============================

In order to comply with this demand, one will add hostname support to the 
whitelist
parameter. An example could be:

SET GLOBAL 
group_replication_ip_whitelist="mylocaldomain.com/8,8.9.10.0/20,192.168.1.1,192.
168.2.0/24";

This construct, as one can notice, adds the possibility of configuring a name 
and
a mask to the whitelist parameter. 

This will force an additional verification which is to check if the name is 
valid
and resolvable. If it is, it will proceed to the existing validation already as 
an 
IP address.

When showing the final value of group_replication_ip_whitelist, it will maintain
the value as it was configured, since name resolution will only happen when
a new connection arrives. When that happens, one will do the name resolution, 
and,
if the name is not resolvable, it will not be considered for whitelist 
validation.

3. IPv6 name resolution
==============================

Since GCS/XCom does not support IPv6, any name that translates exclusively to
an IPv6 address will be skipped in the verification phase.

4. Observability
=============

Name resolution will take place when a new connection requests its validation.
Only then one will validate if the added name matches an existing IP address.
If the address is not resolvable, a warning will be written in the error log:
"[GCS] Warning: the server was unable to resolve '%s'. This address will not be 
included in the whitelist."
      
5. User Interface
==============

The visible user impact of this change is the ability to configure names in the
already existing whitelist parameter. An example is:

SET GLOBAL 
group_replication_ip_whitelist="mylocaldomain.com/8,8.9.10.0/20,192.168.1.1,192.
168.2.0/24";


6. Deployment / Install
====================

No new files will be added and no new components/plugins will be installed.

7. Protocol
========

This has no effects in the GCS/XCom protocol.

8. Security
==============================

Using hostnames without care might increase the attack surface which was
restricted by the usage of the whitelist itself. This might happen for two 
reasons:
- Using hostnames does not restrict which hosts might actually connect, thus 
  opening a door for any host that uses that name. That added to the possibility
  of using netmasks, broadens the scope that was previsouly restricted.
- We become indirectly vulnerable to DNS Spoofing attacks.

The recommendations to avoid this is to use it when strictly necessary and make
sure that all components that lead to name resolution, such as DNS servers, are
safe and under your control. Another solution could be to make name resolution
local via the hosts file, thus avoiding the usage of external components.

An additional verification that will be implemented is the one that already
exists in MySQL client library, which is Forward-confirmed reverse DNS (FCrDNS),
in which, after resolving the name, one will check if that IP has the name
associated. Besides allowing to check DNS errors, it will also create a valid
relationship between the IP and the address. For more details, check here: 
https://en.wikipedia.org/wiki/Forward-confirmed_reverse_DNS

9. Upgrade/Downgrade
=============================

Upgrade brings no issues, but if you downgrade to a version that does not 
support
hostnames and you maintain the configurations,
there will be an error when trying to parse those configurations in runtime.
1. Modifications
=======================

1.1 Modifications to the Whitelist class

1.1.1 Whitelist contents

Currently, the whitelist object has two containers:
- One where the actual value that was provided is stored;
- Another one which is an octet cache of the IP values

This will be changed to a virtual objects list. That list will be a container
of Whitelist entries. Those entries will be IPs or Hostnames, implemented in 
derived classes and they will have a virtual method named get_value(), 
which will provide an octet solved IP.

The main difference is that, in the Hostname implementation, a name resolution
will occur when get_value() is called, whereas in the IP implementation, a 
cached IP in octet format value will be returned.

class Gcs_ip_whitelist_entry
{
public:
  Gcs_ip_whitelist_entry(std::string addr, std::string mask);
  
  virtual ~Gcs_ip_whitelist_entry();

  virtual bool init_value() = 0;

  virtual std::pair< std::vector<unsigned char>, 
                     std::vector<unsigned char> > *get_value() = 0;

  std::string get_addr() const {return m_addr;};
  std::string get_mask() const {return m_mask;};
  
  bool operator<(const Gcs_ip_whitelist_entry& other);
  bool operator<(const Gcs_ip_whitelist_entry*& other);
private:
  std::string m_addr;
  std::string m_mask;
};

1.1.2 Whitelist operations

Whitelist will have more modifications since add_address needs to be a Factory
method in which one chooses between a Hostname or IP entry.

A function needs to be extracted to get IP/Mask octets since this operation 
now is used in two different places.

Finally, do_check_block now needs to work with the new Whitelist entries instead
of simple octets as it was before.

1.2 Create resolve_ip_addr_from_hostname

New function that will replace the old get_ipv4_addr_from_hostname. Its purpose
will be to resolve the name to a physical address.

1.3 Change string_to_sockaddr

Modify this method to make name resolution prior to trying to convert from
string to sockaddr. It shall return an error in case of name resultion failure.

1.5 Testing

Modify gr_ip_whitelist_options.test in order to incorporate names in the tests
that already exist, meaning:
- Testing valid name
- Testing invalid name
- Testing IPv6 integration